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We develop information field theory (IFT) as a means of Bayesian inference on spatially dis- 
tributed signals, the information fields. A didactical approach is attempted. Starting from general 
considerations on the nature of measurements, signals, noise, and their relation to a physical re- 
ality, we derive the information Hamiltonian, the source field, propagator, and interaction terms. 
Free IFT reproduces the well known Wiener-filter theory. Interacting IFT can be diagrammatically 
expanded, for which we provide the Feynman rules in position-, Fourier-, and spherical harmon- 
ics space, and the Boltzmann-Shannon information measure. The theory should be applicable in 
many fields. However, here, two cosmological signal recovery problems are discussed in their IFT- 
formulation. 1) Reconstruction of the cosmic large-scale structure matter distribution from discrete 
galaxy counts in incomplete galaxy surveys within a simple model of galaxy formation. We show that 
a Gaussian signal, which should resemble the initial density perturbations of the Universe, observed 
with a strongly non-linear, incomplete and Poissonian-noise affected response, as the processes of 
structure and galaxy formation and observations provide, can be reconstructed thanks to the virtue 
of a response-renormalization fiow equation. 2) We design a filter to detect local non-linearities in 
the cosmic microwave background, which are predicted from some Early-Universe inflationary sce- 
narios, and expected due to measurement imperfections. This filter is the optimal Bayes' estimator 
up to linear order in the non-linearity parameter and can be used even to construct sky maps of 
non-linearities in the data. 



I. INTRODUCTION 
A. Motivation 

The optimal extraction and restoration of information 
from data on spatially distributed quantities like the cos- 
mic large-scale structure (LSS) or the cosmic microwave 
background (CMB) temperature fluctuations in cosmol- 
ogy, but also on many other signals in physics and re- 
lated fields, is essential for any quantitative, data-driven 
scientific inference. The problem of how to design such 
methods possesses many technical and even conceptual 
difficulties, which have led to a large number of recipes 
and methodologies. 

Here, we address such problems from a strictly infor- 
mation theoretical point of view. We show, as others 
have done before, that information theory for distributed 
quantities leads to a statistical field theory, which we 
name information field theory (IFT). In contrast to the 
previous works, which mostly treat such problems on a 
classical field level, as will be detailed later, here, we 
take full advantage of the existing field theoretical appa- 
ratus to treat interacting and non-classical fields. Thus, 
we show how to use diagrammatic perturbation theory 
and renormalization flows in order to construct optimal 
signal recovering algorithms and to calculate moments 
of their uncertainties. Non-classicality manifests itself as 
quantum and statistical fluctuations in quantum and sta- 
tistical field theory (QFT & SFT), and very similarly as 
uncertainty in IFT. 

The information theoretical perspective on signal infer- 
ence problems has technical advantages, since it permits 



to design information-yield optimized algorithms and ex- 
perimental setups. However, it also provides deeper in- 
sight into the mechanisms of knowledge accumulation, 
its underlying information flows, and its dependence on 
data models, prior knowledge and assumptions than pure 
empirical evaluations of ad-hoc algorithms alone could 
provide. 

We therefore hope that our work is of interest for two 
types of readers. The first are applied scientists, who 
are mainly interested in the practical aspect of IFT since 
they are facing a concrete inverse problem for a spatially 
distributed quantity, especially but not exclusively in cos- 
mology. The second are more philosophical or theoret- 
ically inclined scientists, for whom IFT may serve as a 
framework to understand and classify many of the exist- 
ing methods of signal extraction and reception. Since we 
expect that many interested readers are not very famil- 
iar with field theoretical formalisms, we introduce some 
of its basic mathematical concepts. Due to this antici- 
pated non-uniform readership, not everything in this arti- 
cle might be of everyones interest, and therefore we pro- 
vide in the following a short overview on the structure 
and content of the article. 



B. Overview of the work 

The remainder of this introduction section contains a 
detailed discussion of the previous work on signal infer- 
ence theory as well as a very brief introduction into the 
here relevant works on the cosmic LSS and the CMB. 
The main part of this article falls into two categories: 
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abstract IFT and its application. The concepts of IFT 
are introduced in Sec. |TT1 where Bayesian methodology, 
the distinction of physical and information fields, the def- 
inition of signal response and noise, as well the design of 
signal spaces are discussed. The basic IFT formalism in- 
cluding the free theory is introduced in Sec. IIIH which, 
according to our judgement, summarizes and unifies the 
previous knowledge on IFT before this paper. An im- 
patient reader, only interested in applying IFT and not 
worrying about concepts, may start reading in Sec. IIIII 
From Sec. IIVI on the new results of this work are pre- 
sented, starting with the discussion of interacting infor- 
mation fields, their Hamiltonians and Feynman rules, and 
the Boltzmann-Shannon information measure. The nor- 
malisability of sensibly constructed IFTs is shown, as well 
the classical information field equation is presented there. 
A step-by-step recipe of how to derive and implement an 
IFT algorithms is also provided. 

Details of the notation can be found, if not defined in 
the main text, in Appendix \X[ 

Applications of the theory are provided in the following 
two sections, which can be skipped by a reader interested 
only in the general theoretical framework. Although spe- 
cific inference problems are addressed, they should serve 
as a blueprint for the tackling of similar problems. In 
Sec. |V]the problem of the reconstruction of the cosmic 
matter distribution from galaxy surveys is analyzed in 
terms of a Poissonain data model. In Sec. IVII we derive 
an optimal estimator for non-Gaussianity in the CMB, 
and show how it can be generalized to map potential 
non-Gaussianities in the CMB sky. Our summary and 
outlook can be found in Sec. IVIII 



C. Previous works 

The work presented here tries to unify information the- 
ory and statistical field theory in order to provide a con- 
ceptual framework in which optimal tools for cosmologi- 
cal signal analysis can be derived, as well as for inference 
problems in other disciplines. Below, we provide very 
brief introductions into each of the required fields^ (in- 
formation theory, image reconstruction, statistical field 
theory, cosmological large-scale structure, and cosmic mi- 
crowave background), for the orientation of non-expert 
readers. An expert in any of these fields might decide to 
skip the corresponding sections. 



^ This work has tremendously benefitted in a direct and indirect 
way from a large number of previous publications in those fields. 
We, the authors, have to apologize for being unable to give full 
credit to all relevant former works in those fields for only con- 
centrating on a brief summary of the papers more or less directly 
influencing this work. This collection is obviously highly biased 
towards the cosmological literature due to our main scientific 
interests and expertise, and definitely incomplete. 



1. Information theory and Bayesian inference 

The fundament of information theory was laid by the 
work of Bayes [l| on probability theory, in which the cele- 
brated Bayes theorem was presented. The theorem itself 
(see Eq. [71) is a simple rule for conditional probabilities. 
It only unfolds its power for inference problems if used 
with belief or knowledge states, described by conditional 
probabilities. 

The advent of modern information theory is probably 
best dated by the work of Shannon ^, ^] on the concept 
of information measure, being the negative Boltzmann- 
entropy, and the work of Jaynes, combining the language 
of statistical mechanics and Bayes probability theory and 
applying it to knowledge uncertainties [1. Isl. la. ItI. Isl. lol. [lOj . 
The required numerical evaluation of Bayesian probabil- 
ity integrals suffered often from the curse of high dimen- 
sionality. The standard recipe against this, still in mas- 
sive use today, is importance sampling via Markov-Chain 
Monte-Carlo Methods (MCMC), following the ideas of 



Metropolis et al. Hastings and Geman and Ge- 
man [13], where the latter authors already had image 
reconstruction applications in mind. The Hamiltonian 
MCMC methods [14] , in which the phase-space sampling 
is partly following Hamiltonian dynamics, are also of rel- 
evance here. There the Hamiltonian is introduced as the 
negative logarithm of the probability, as we do in this 
work. 

With such tools, higher dimensional problems, as 
present in signal restoration, could and can be tackled, 
however, for the price of getting stochastic uncertainty 
into the computational results. For a recent review on 
image restoration MCMC techniques, see 

The applications and extensions of these pioneering 
works are too numerous to be listed here. Good mono- 
graphs exist and the necessary references can be found 
there [IE 113, [H [H, S El . 



2. Image reconstruction in astronomy and elsewhere 

The problem of image reconstruction from incomplete, 
noisy data is especially important in astronomy, where 
the experimental conditions are largely set by the nature 
of distant objects, weather conditions, etc., all mainly 
out of the control of the observer, as well as in other 
disciplines like medicine and geology, with similar limita- 
tions to arrange the object of observations for an optimal 
measurement. Some of the most prominent methods of 
image reconstruction, which are based on a Bayesian im- 
plementation of an assumed data model, are the Wiener- 
filter [13], the Richardson-Lucy algorithm [2^ [2^, and 
the maximum-entropy irnage restoration (25|(see also 

[H, [13, [H, [2i, [33, iJ, m [si, IM [H, [si, [13 ) . 

The Wiener filter can be regarded to be a full Bayesian 
image inference method in case of Gaussian signal and 
noise statistics, as we will show in Sect. IIIIBI It will 
be the working horse of the IFT formalism, since the 
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Wiener filter represents the algorithm to construct the 
exact field theoretical expectation value given the data 
for an interaction-free information Hamiltonian. The fil- 
ter can be decomposed into two essential information 
processing steps, first building the information source by 
response-over-noise weighting the data, and then propa- 
gating this information through the signal space, by ap- 
plying the so called Wiener variance. 

The Richardson-Lucy algorithm is a maximum- 
likelihood method to reconstruct from Poissonian data 
and therefore is also of Bayesian origin. This method 
has usually to be regularized by hand, by truncation of 
the iterative calculations, against an over-fitting insta- 
bility due to the missing (or implicitly flat) signal prior. 
A Gaussian-prior based regularization was recently pro- 
posed by Kitaura and EnBlin [38] , and the implementa- 
tion of a variant of this is presented here in Sect. IV Dl 

Maximum entropy algorithms will not be the topic 
here, as well as not a number of other existing methods, 
which are partly within and partly outside the Bayesian 
framework. T hey may be found in existing reviews on 
this topic [e.g. Ham ■ 



3. Statistical and Bayesian field theory 

The relation of signal reconstruction problems and field 
theory was discovered independently by several authors. 
In cosmology, a prominent work in this directions is 
Bcrtschinger [4l[, in which the path integral approach 
was proposed to sample primordial density perturbations 
with a Gaussian statistics under the constraint of exist- 
ing information on the large scale structure. The work 
presented here can be regarded as a non-linear, non- 
Gaussian extension of this. Many methods from statistics 
and from statistical mechanics were of course used even 
earlier, e.g. the usage of moment generating function for 
cosmic density fields can already be found in Fry [i^l . 

Simultaneously to Bertschinger's work, Bialek and Zee 
[H, li^l argued that visual perception can be modeled as 
a field theory for the true image, being distorted by noise 
and other data transformations, which are summarized 
by a nuisance field. A probabilistic language was used, 
but no direct reference to information theory was made, 
since not the optimal information reconstruction was the 
aim, but a model for the human visual reception system. 
However, this work actually triggered our research. 

Bialek et al. [i^ applied a field theoretical approach 
to recover a probability distribution from data. Here, a 
Bayesian prior was used to regularize the solution, which 
was set up ad-hoc to enforce smoothness of the recon- 
struction, obtained from the classical (or saddlepoint, or 
maximum a posteriori) solution of the problem. How- 
ever, an "optimal" value for the smoothness controlling 
parameter was derived from the data itself, a topic also 
addressed by Stoica et al. [i^ and by a follow up publi- 
cation to ours [131 ■ Bialek et al. [i^ also recognized, as 
we do, that an IFT can easily be non-local. 



Finally, the work of Lemm and coworkers [i^ [i^, [s^, 
[sTI . [5^ [ssl . [5^ . established a tight connection between 
statistical field theory and Bayesian inference, and pro- 
posed the term Bayesian field theory (BFT) for this. 
However, we prefer the term information field theory 
since it puts the emphasis on the relevant object, the 
information, whereas BFT refers to a method, Bayesian 
inference. The term information field is rather self- 
explaining, whereas the meaning of a Bayesian field is 
not that obvious. 

The applications considered by Lemm concentrate on 
the reconstruction of probability fields over parameter 
spaces and quantum mechanical potentials by means of 
the maximum a posteriori equation. The extensive book 
summarizing the essential insights of these papers, [48| . 
clearly states the possibility of perturbative expansions of 
the field theory. However, this is not followed up by these 
authors probably for reasons of the computational com- 
plexity of the required algorithms. In contrast to many 
of the previous works on IFT, which deal with ad- hoc 
priors, the publication by Lemm [56] is remarkable, since 
it provides explicit recipes of how to implement a priori 
information in various circumstances more rigorously. 

The mathematical tools required to tackle IFT prob- 
lems come from SFT and QFT, which have a vast litera- 
ture. We have specially made use of the books of Binney 
et al. [13, Peskin and Schroeder [SS^, and Zee [591] . 



4- Cosmological large-scale structure 

Our first IFT example in Sec. |V] is geared towards 
improving galaxy-survey based cosmography, the recon- 
struction of the large-scale structure matter distribution. 
We provide here a short overview on the relevant back- 
ground and works. 

The LSS of the matter distribution of the Universe 
is traced by the spatial distribution of Galaxies, and 
therefore well observable. This structure is believed to 
have emerged from tiny, mostly Gaussian initial den- 
sity fluctuations of a relative strength of 10~^ via a self- 
gravitational instability, partly counteracted by the ex- 
pansion of the Universe. The initial density fluctuations 
are believed to be produced during an early inflationary 
epoch of the Universe, and to carry valuable information 
about the inflaton, the field which drove inflation, in their 
TV-point correlation functions, to be extracted from the 
observational data. 

The onset of the structure formation process is well 
described by linear perturbation theory and therefore to 
conserve Gaussianity, however, the later evolution, the 
structures on smaller scales, and especially the galaxy 
formation require non-linear descriptions. The observa- 
tional situation is complicated by the fact that the most 
important galaxy distance indicator, their redshift, is also 
sensitive to the galaxy peculiar velocity, which causes the 
observational data on the three-dimensional LSS to be 
partially degenerated. There are analytical methods to 
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describe these effects^, and also extensive work on N- 
body simulations of the structure formation, the latter 
probably providing us with the most detailed and ac- 
curate statistical data on the properties of the matter 
density field [e.glTSt- 

In recent years, it was recognized that the evolution 
of the cosmic density field and its statistical proper- 
ties can be addressed with field theoretical methods by 
virtue of renormalization flow equations. Detailed semi- 
analytical calculations for the density field time prop- 
agator, the two- and three- point correlation functions 
are now possible due to this, which are expected to 
play an important role in future approaches to recon- 
struct the initial fluctuations from the observational data 

[73, HSSIMil,!!! [11, SSiiSS ■ 

It was recognized early on, that the primordial den- 
sity fluctuations can in principle be reconstructed from 
galaxy observations [4l|. This has lead to a large devel- 
opment of various numerical techniques for an opt imal 
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n4l liisl [lid liiTl. Ink liii Im Ii2il. IT2 

TTia. Il27l. I128L I129L I130L Il3l|. Manv of them 
are based on a Bayesian approach, since they are im- 
plementations and extension of the Wiener fllter. How- 
ever, also other principles are used, like, e.g. the least 
action approach, or Voronq i tess ellation techniques [e.g. 
[m, [Hi llM [isi, [HE [TM [m . a discussion and clas- 
sification of the various methods can be found in ^3§\ . 

Especially the Wiener filter methods were extensively 
applied to galaxy survey data^ and permitted partly 
to extrapolate the matter distribution into the zone of 
avoidance behi nd the ga l actic disk and to close the data- 
gap there, c.f. [I57lll58lll59l |. a topic we also address in 
Sect. El 

Another cosmological relevant information field to be 
extracted from gala.xy ca t alog ues is the LSS power spec- 
trum [e.g. [HO, lilll, IBl [Hi, [HI. This power is also 
measurable in the CMB, and for a long ti me the C MB 
provided the best spectrum normalization jl65l . Il66l | . 



5. Cosmic Microwave Background 

Since our second example deals with the CMB, we give 
a brief overview on it and on related inference methods. 

The CMB reveals the statistical properties of the mat- 
ter field at a time, when the Universe was about 1100 
times smaller in linear size than it is today. The photon- 
baryon fluid, which decouples at that epoch into neutral 



Hydrogen and free streaming photons, has responded to 
the gravitational pull of the then already forming dark 
matter structures. The photons from that epoch cooled 
due to the cosmic expansion since then into the CMB 
radiation we observe today, and carry information on 
the physical properties of the photon-baryon fluid of that 
time like density, temperature and velocity. To very high 
accuracy, the spectrum of the photons from any direc- 
tion is that of a blackbody, with a mean temperature of 
2.7 Kelvin and fluctuations of the order of 10~^ Kelvin, 
imprinted by the primordial gravitational potentials at 
decoupling. 

Therefore, mapping these temperature fluctuations 
permits precisely to study many cosmological parameters 
simultaneously, like the amount of dark matter produc- 
ing the gravitational potentials, the ratio of photons to 
baryons, balancing the pressure and weight of the fluid, 
and geometrical and dynamical parameters of space-time 
itself. The observations are technically challenging, and 
therefore require sophisticated algorithms to extract the 
tiny signal of temperature fluctuations against the instru- 
ment noise, but also to separate it from other astrophys- 
ical foreground emission with the best possible accu racy . 



A number of suc h algorithms were developed [e.g . 

rfllrfl [TtI . [T75I . [iTel . [ml . trm . 



168 


169 


170 


171 




180 


181 


182, 


183 





1^ 

178 



which in many cases implement 
the Wiener filter. Thus, the required numerical tools for 
an IFT treatment of CMB data are essentially available. 

The expected temperature fiuctuations spectrum can 
be calculated from a linear perturbative treatment of 
the Boltzmann equations of all dynamical active parti- 
cle species at this epoch, and fast computational imple- 
mentations exists permitting to predict it for a given set 
of cosmological parameters. Well known codes for this 
task are publicly available^ and permit to extract infor- 
mation on cosmological parameters from the measured 
CMB temperature fiuctuation spectrum via comparison 
to their predictions for a given parameter set. It was 
recognized early on that this should happen in an infor- 
mation theoretically optimal way, and Bayesian methods 
were therefore adapted in th at area well b e fore in other 
astrophysical disciphnes [e.g. [HI [HI [H3, [HH ■ 

The initial metric and density fluctuations, from which 
the CMB fluctuations and the LSS emerged, are believed 
to be initially seeded by quantum fluctuations of a hy- 
pothetical inflaton fleld, which should have driven an 
infla, t ionary expansion phase in the very early Universe 
[HI [HI Ills, [Hi [HE [H3 ■ The Inflaton-induced fluc- 
tuations have a very Gaussian probability distribution, 
however, some non-Caussian features seem to be un- 
avoidable in most scenarios and can serve as a fingerprint 
to discriminate among them [e.g. ll98lll99l . i200. . .201i] . Ob- 



^ Of special interest in this context may be [6 0*1, which aheady 
appUes path-integrals, [6l|, jH, IHjll, il IMlSTi, 168,, ;6^, ^ [tJ, 
l72l |73|. [T^I . and the papers they refer to. 

^ Survey b ased reconstructions of the co s mic matter fie l ds can be 

found in [l39l . ITiol [l4l]. UM [Til UM UM UM UM UM . 
[l50l. Hsu Il5a ll5 a [l54l .ll55l . ll56ll . 



* E.g. cmbf ast 

|http : //as cl . net/cmbf ast ■ ht ml 
(http://camb.info/ llSSH '). 
i jhttp : //www . cmbeasy . org/ 1 [187| ). 



( httj 77/cmbf ast . org [ 
ilSal l camb 
and cmbeasy 
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servational tests on such non-Gaussianities based on the 
thre e -poi n t co r relat i on fu nction of the CMB data [e.g. 
[20l . [2Q3, [20I [20I [20i] were so far mostly negative, 
however not sensitive enough to seriously constrain the 
possible theoret ical param eter space of inflationary sce- 
narios, see e.g. j207l l208l |. Recently, there has been the 
claim of a detection of such non-Gaussianities by Yadav 
and Wandelt [209] and a confirmation of this with better 
data and improved algorithms is therefore highly desir- 
able. In Sect. IVII we make a proposal for improving the 
algorithmic side of this challenge. A recent review o n th e 
current status of CMB-Gaussianity can be found in j210l |. 



II. CONCEPTS OF INFORMATION FIELD 
THEORY 

A. Information on physical fields 

In our attempts to infer the properties of our Uni- 
verse from astronomical observations we are faced with 
the problem of how to interpret incomplete, imperfect 
and noisy data, draw our conclusions based on them and 
quantify the uncertainties of our results. This is true for 
using galaxy surveys to map the cosmic LSS, for the in- 
terpretation of the CMB, as well for many experiments 
in physical laboratories and compilations of geological, 
economical, sociological, and biological data about our 
planet. Information theory, which is based on probability 
theory and the Bayesian interpretation of missing knowl- 
edge as probabilistic uncertainty, offers an ideal frame- 
work to handle such problems. It permits to describe 
all relevant processes involved in the measurement prob- 
abilistically, provided a model for the Universe or the 
system under consideration is adopted. 

The states of such a model, denoted by the state vari- 
able '0, are identified with the possible physical reali- 
ties. They can have probabilities P{ip) assigned to them, 
the so-called prior information. This prior contains our 
knowledge about the Universe as we model it before any 
other data is taken. For a given cosmological model, the 
prior may be the probability distribution of the different 
initial conditions of the Universe, which determine the 
subsequent evolution completely. Since our Universe is 
spatially extended, the state variable will in general con- 
tain one or several fields, which are functions over some 
coordinates x. 

Also the measurement process is described by a data 
model which defines the so-called likelihood, the prob- 
ability P{d\ip) to obtain a specific dataset d given the 
physical condition -0. In case the outcome d of the mea- 
surement is deterministic P{d\ip) = d{d — dlip]), where 
d[ip] is the functional dependence of the data on the state. 
In any case, the probability distribution function of the 
data. 



is given in terms of a phase-space or path integral over 
all possible realizations of '0, to be defined more precisely 
later (Sect. UTeT]) . 

A scientist is not actually interested in the total state 
of the Universe, but only in some specific aspects of it, 
which we call the signal s — s[V']. The signal is a very 
reduced description of the physical reality, and can be 
any function of its state 0, freely chosen according to 
the needs and interests of the scientist or the ability and 
capacity of the measurement and computational devices 
used. Since the signal does not contain the full phys- 
ical state, any physical degree of freedom which is not 
present in the signal but influences the data will be re- 
ceived as probabilistic uncertainty, or shortly noise. The 
probability distribution function of the signal, its prior 

P{s) - J V^pS{s-s[^J,])P{lP), (2) 

is related to that of the data via the joint probability 

P{d,s)= Jv^6{s-smP{d\^j)P{^j), (3) 

from which the conditional signal likelihood 

P{d\s)=P{d,s)/P{s) (4) 

and signal posterior 

P{s\d) ^ P{d,s)/P{d) (5) 

can be derived. 

Before the data is available, the phase-space of interest 
is spanned by the direct product of all possible signals s 
and data d, and all regions with non-zero P{d, s) are of 
potential relevance. Once the actual data dobs have been 
taken, only a sub-manifold of this space, as flxed by the 
data, is of further relevance. The probability function 
over this sub-space is proportional to P{d — dobs, s), and 
needs just to be renormalized by dividing by 

J VsP{dobs,s) = Jvs Jv^bS{s-s[4>])PidoMPW 

= Jv^jP{d,i,,\^P)P{^P)=P{d,i,,), (6) 

which is the unconditioned probability (or evidence) of 
that data. Thus, we find the resulting information of 
the data to be the posterior distribution P{s\dobs) — 
P{dohs, s)/ P{dobs)- This posterior is the fundamental 
mathematical object from which all our deductions have 
to be made. It is related via Bayes's theorem [ll to the 
usually better accessible signal likelihood, 

P{s\d)^P{d\s)P{s)/P{d), (7) 

which follows from Eqs. [4]and[5l 

The normalization term in Bayes's theorem, the evi- 
dence P{d), is now also fully expressed in terms of the 
joint probability of data and signal. 



P{d) = / Vi:P{d\i;)P{i;), (1) 



P{d) = / VsP{d,s), (8) 
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and the underlying physical field tp basically becomes in- 
visible at this stage in the formalism. The evidence plays 
a central role in Bayes inference, since it is the likeli- 
hood of all the assumed model parameters. Combining 
this parameter-likelihood with parameter-priors one can 
start Bayesian inference on the model classes. 

B. Signal response and noise 

If signal and data depend on the same underlying phys- 
ical properties, there may be correlations between the 
two, which can be expressed in terms of signal response 
R and noise n of the data as 

d^R[s]+ns. (9) 

We have chosen two different ways of denoting the de- 
pendence of response and noise on the signal s, in order 
to highlight that the response should embrace most of 
the reaction of the data to the signal, whereas the noise 
should be as independent as possible. We ensure this by 
putting the linear correlation of the data with the signal 
fully into the response. The response is therefore the part 
of the data which correlates with the signal 

R[3] = = J VddP{d\s), (10) 

and the noise is just defined as the remaining part which 
does not: 

ns = d-R[s\=d-{d)(^d\s)- (11) 

Although the noise might depend on the signal, as it is 
well known for example for Poissonian processes, it is - 
per definition - linearly uncorrelated to it, 

{ris s^)(d\s) = {{d)(d\s) ~ R[s\) = = 0, (12) 

whereas higher order correlation might well exist and 
may be further exploited for their information content. 
The dagger denotes complex conjugation and transposing 
of a vector or matrix. 

These definitions were chosen to be close to the usual 
language in signal processing and data analysis. They 
permit to define signal response and noise for an arbitrary 
choice of the signal s[^/']. No direct causal connection 
between signal and data is needed in order to have a 
non-trivial response, since both variables just need to 
exhibit some couplings to a common sub-aspect of ■;/'■ 
The above definition of response and noise is however not 
unique, even for a fixed signal definition, since any data 
transformation d' = T[d] can lead to different definitions, 
as seen from 

R'[s] ^ = {T[d])(d\s) + = T[i?[s]]. 

(13) 

Exceptions are some unique relations between signal and 
state, P(7/'|s) = i5('(/' — V' [s] ) J ^-nd maybe a few other very 



special cases. Thus, the concepts of signal response and 
therewith defined noise depend on the adopted coordi- 
nate system in the data space. This coordinate system 
can be changed via a data transformation T, and the 
transformed data may exhibit better or worse response 
to the signal. Information theory aids in designing a suit- 
able data transformation, so that the signal response is 
maximal, and the signal noise is minimal, permitting the 
signal to be best recovered. Thus, we may aim for an 
optimal T, which yields 

T\d\ = (14) 

We define the posterior average of the signal, nid — 
(s)(s|d)j to be the map of the signal given the data d and 
call T a map-making-algorithm if it fulfills Eq. [T3]at least 
approximately. As a criterion for this one may require 
that the signal response of a map-making-algorithm, 

Rt[s] EE (T[d])(rf|,), (15) 

is positive definite with respect to signal variations as 
stated by 

^ > 0. (16) 

OS 

This ensures that a map-making algorithm will respond 
with a non-negative correlation of the map to any signal 
feature, with respect to the noise ensemble. In general, 
T will be a non-linear operation on the data, to be con- 
structed from information theory if it should be optimal 
in the sense of Eq. [M) In any case, the fidelity of a sig- 
nal reconstruction can be characterized by the quadratic 
signal uncertainty, 

4,d-((s-7^M])(s-T[d])t)(,|,), (17) 

averaged over typical realizations of signal and noise. Of 
special interest is the trace of this 

Tr(4d) = jdx{\s,-TM?){s\d), (18) 

since it is the expectation value of the squared Lebesgue- 
L^-space distance between a signal reconstruction and 
the underlying signal. Requesting a map making algo- 
rithm to be optimal with respect to Eq. 1181 implies 
T[d] = (s)(s|rf) and therefore it to be optimal in an in- 
formation theoretical sense according to Eq. [141 

The uncertainty a"^ ^ depends on d, since in Bayesian 
inference one averages over the posterior, which is condi- 
tional to the data. The frequentist uncertainty estimate, 
which is the expected uncertainty of any estimator before 
the data is obtained, is given by an average over the joint 
probability function: 

4 = ((,s-T[d])(s-T[d])t)(,,,). (19) 

The latter is a good quantity to characterize the overall 
performance of an estimator, whereas Tr(cr|n ^) is a more 
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precise indicator of the actual estimator performance for 
a given dataset. As we will see in our IFT applications, 
data dependence of the uncertainty is a common feature 
of non- linear inference problems. 

An illustrative example should be in order. Suppose 
our data is an exact copy of a physical field, d — -tp, our 
signal the square of the latter, s = ip-^, and the physical 
field obeys an even statistics, P(V') = P{—i')- Then, 
the signal response is exactly zero, R[s\ = 0, and the 
data contains only noise with respect to the chosen signal, 
d = Ug. Thus, we have chosen a bad representation of 
our data to reveal the signal. If we, however, introduce 
the transformation d' — T[d] = d^, we find a perfect 
response, R'[s] = s, and zero noise, n'^ — 0. 

In this case, finding the optimal map-making algorithm 
was trivial, but in more complicated situations, it can 
not be guessed that easily. Since the response and noise 
definitions depend on the signal definition, some thoughts 
should be given to how to choose the signal in a way that 
it can be well reconstructed. 



C. Signal design 

For practical reasons one will usually choose s accord- 
ing to a few guidelines, which should simplify the infor- 
mation induction process: 

1. The functional form of slip] should best be simple, 
steady, analytic, and if possible linear in ip, permit- 
ting to use the signal s to reason about the state of 
reality ip. 

2. The degrees of freedom of s should be related to 
the ones of the data d in the sense that cross cor- 
relations exist which permit to deduce properties 
of s from d. Signal degrees of freedoms, which are 
insensitive to the data, will only be constrained by 
the prior and therefore just contain a large amount 
of uncertainty. This adds to the error budget, and 
should be avoided as far as possible. 

3. The choice of s[tp] should also be lead by math- 
ematical convenience and practicality. In the ex- 
amples presented in this work, simple signals are 
chosen which permit to guess good approximations 
for signal likelihood P{d\s) and prior P{s) without 
the need to develop the full physical theory starting 
with P(7/i). 

To give a more specific example, we assume a cosmo- 
logical model in which the reality is thought to be solely 
characterized by the primordial dark matter density dis- 
tribution ip{^): from which all observable cosmological 
phenomena like galaxies derive in a deterministic way. 
The coordinate x may refer to the comoving coordinates 
at some early epoch of the Universe. Although the LSS 
of the matter distribution at a later time may predom- 
inantly depend on the initial large-scale modes, and is 
reflected in the galaxy distribution, the actual positions 



of the individual galaxies also depend in a non-trivial 
way on the small-scale modes. Due to the discreteness of 
our observable, the galaxy positions, it may be impossi- 
ble to reconstruct these small scale modes. Therefore it 
could be sensible to define a signal s[ip] — Ftp, with F 
being a linear low-pass filter, which suppresses all small- 
scale structures. This signal may be reconstructible with 
high precision, whereas any attempt to reconstruct ■0 di- 
rectly would be plagued by a larger error budget, since 
all the data-unconstrained small-scale modes represent 
uncertainties to a reconstruction of ip, but not to one of 
s being defined as a low pass filtered version of ip. 



D. Signal moment calculation 

The information of some data c? on a signal s defined 
over some set 51, which in most applications will be a 
manifold like a sub- volume of the R", or the sphere in 
case of a CMB signal, is completely contained in the 
posterior P{s\d) of the signal given the data.^ The ex- 
pectation value of s at some location x d Q, and higher 
correlation functions of s can all be obtained from the 
posterior by taking the appropriate average: 

{s{xi) ■ ■ ■ s{Xn))d = {s{xi) ■ ■ ■ s{x„))(s\d) 

= jvss{xi)---s{xrP)P{s\d).{2{)) 

The problem is that often neither the expectation val- 
ues nor even the posterior are easily calculated analyt- 
ically, even for fairly simple data models. Fortunately, 
there is at least one class of data models for which the 
posterior and all its moments can be calculated exactly, 
namely in case the posterior turns out to be a multivari- 
ate Gaussian in s. In this case analytical formulae for 
all moments of the signal are known and are in principle 
computable. Technically, one is still often facing a huge, 
but linear inverse problem. However, in the last decades 
a couple of computational high-performance map-making 
techniques were developed to tackle such problems either 
on the sphere, for CMB research, or in flat spaces with 
one, two or three dimensions, for example for the recon- 
struction of the cosmic LSS (detailed references are given 
in Sect. II C[) . The purpose of this work is to show how 
to expand other posterior distributions around the Gaus- 
sian ones in a perturbative manner, which then permits 
to use the existing map-making codes for the computa- 
tion of the resulting diagrammatic perturbation series. 
Since the diagrammatic perturbation series in Feynman- 
diagrams are well known and understood in QFT and 



^ We are mostly dealing with scalar fields, however, multi- 
component, vector or tensor fields can be treated analogously, 
and many of the equations just have to be re-interpreted for 
such fields and stay valid. 
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SFT, the most economical way is to reformulate the in- 
formation theoretical problem in a language which is as 
close as possible to the former two theories. Thereby, 
many of the results and concepts become directly avail- 
able for signal inference problems. Moreover, it seems 
that expressing the optimal signal estimator in terms of 
Feynman diagrams immediately provides computation- 
ally efficient algorithms, since the diagrams encode the 
skeleton of the minimal necessary computational infor- 
mation flow. 



E. Signal and data spaces 

1. Discretisation and continuous limit 

Both, the signal and the data space may be continuous, 
however, in practice will most often be discrete since dig- 
ital data processing only permits to chose a discretized 
representation of the distributed information. The space 
in which the data and signal discretisation happens can 
be chosen freely, and of course can be as well a Fourier, 
wavelet or spherical harmonics space. Even if we would 
like to analyze a continuous signal, the computationally 
required discretisation will force an implicit redefinition 
of our actual signal to be the discretely sampled version of 
that continuous signal, and this discretisation step should 
also be part of the data model, if it has the potential to 
significantly affect the analysis [e.g. see l21l| . 

Although discretisation implies some information loss 
it also has an advantage. We can just assume discretisa- 
tion and therefore read all scalar and tensor products as 
being the usual, component-wise ones, now just in high-, 
but finite-dimensional vector spaces. 

To be concrete, let {xi} C be a discrete set of iVpix 
pixel positions, each of which has a volume-size Vi at- 
tributed to it, then the scalar product of two discretized 
function- vectors / = (/i), and g = (gi) sampled at these 
points via fi = f{xi), and gi = g(xi) could be defined by 

(21) 

i=l 

The asterix denotes complex conjugation. This scalar 
product has the continuous limit 

<?V^ Jdxg*{x)f{x). (22) 

In many cases the actual volume normalization in Eq. 
[5T]does not matter for final results, since it usually can- 
cels out, and therefore Vi is often dropped completely for 
equidistant sampling of signal and data spaces. The vol- 
ume terms also disappear for a scalar product involving 
a function which is discretized via volume integration, 
fi — Jy dx f{x), e.g. the number of counts within the 
cell i. Anyhow, higher order tensor products are defined 
analogously. 



The path integral of a functional F\f] = 
F{fi, . . . , Jn^.J over all reafizations of such a dis- 
cretized field / is then just a high-dimensional volume 
integral, with as many dimensions as pixels: 

This definition of a finite-dimensional path integral is well 
normalized, since in case that we want to integrate over 
a probability distribution over /, which is separable for 

all pixels, Pif) = Ylfj^'i Piifi)i ^-S- fo'' white and 
Poissonian noise, we find 

- J VfPif) =11 J dfP^if) - 1. (24) 

=1 

Although, in real data-analysis applications, it is prac- 
tically never required to perform the continuous limit 
A^pix — * oo with Vi —^ for all i, we stress that this limit 
can formally be taken and is well defined even for the 
path integral, as we argue in more detail in Sec. IIVBI 
The basic argument is that suitable signals could and 
should be defined in such a way that path-integral di- 
vergences, which plague sometimes QFT, can easily be 
avoided by sensible signal design. Practically, the ex- 
istence of a well-defined continuous limit of a well-posed 
IFT implies that two numerical implementations of a sig- 
nal reconstruction problem, which differ in their space 
discretisation on scales smaller than the structures of the 
signal, can be expected to provide identical results up 
to a small discretisation difference, which vanishes with 
higher discretisation-resolution. 

2. Parameter spaces 

In many applications, the signal space is identified with 
the physical space or with the sphere of the sky. How- 
ever, IFT can also be done over parameter spaces. In 
Sec. IVI[ a field theory over the sphere will implicitly de- 
fine the knowledge state for an unknown parameter of 
that theory, which can be regarded again to define an 
information theory for that parameter. The latter is an 
IFT in case that the parameter has spatial variations. 

However, there are also functions defined over a param- 
eter space, riparamotor = {p} for somc parameter p, which 
one might want to obtain knowledge on from incomplete 
data. A very import one is the probability distribution 
of the parameter given the observational data, P(p\d), 
which defines our parameter-knowledge state. This func- 
tion may only be incompletely known and therefore re- 
quire an IFT approach for its reconstruction and inter- 
polation. Such incomplete knowledge on the function 
could be due to incomplete numerical sampling of its 
function values because of large computational costs and 
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the huge volumes of multi-dimensional parameter spaces. 
Or, there might be another unknown nuisance parame- 
ter q in the problem, which induces an uncertainty in 
-P{p\d} — P{p\d) and therefore an IFT over all possible 
realizations of this knowledge state field function via 



mvw] = J 



Pi 



{P\d) 



-JdqP{p,q\d) 



(25) 



In case that g is a field, the marginalisation integral in 
the delta functional also becomes a path-integral. Prob- 
abilistic decision theory, based on knowledge state as ex- 
pressed by probability functions on parameters, has to 
deal with such complications. For inference directly on 
p, and not on the knowledge state P{p\d)-, the marginal- 
ized probability 



P(p|d)- dqPip,q\d) 



(26) 



contains all relevant information, and that will be suffi- 
cient for most inference applications, and especially for 
the ones in this work. 



III. BASIC FORMALISM 

A. Information Hamiltonian 

We argued that the posterior P{s\d) contains all avail- 
able information on the signal. Although the posterior 
might not be easily accessible mathematically, we assume 
in the following that the prior P{s) of the signal before 
the data is taken as well as the likelihood of the data 
given a signal P(d\s) are known or at least can be Taylor- 
Frechet-expanded around some reference field configura- 
tion t. Then Bayes's theorem permits to express the pos- 
terior as 



P{s\d) = 



P{d, 



Pid) 

Here, the Hamiltonian 



Pid\s)Pis) _ I ^^,,1 

p{d) - ■ ^ 



H[.s] = H,[s] = - log [Pid, s)] = - log [P{d\.s) P(s)] , 

(28) 

the evidence of the data 



Pid) 



jvs P{d\s)P{s) = jvs 



.-H\s] _ 



= Z, (29) 



and the partition function Z = were introduced. It 
is extremely convenient to include a moment generating 
function into the definition of the partition function 



(30) 



This means Pid) = Z = Z[0], but also permits to 
calculate any moment of the signal field via Frechet- 
differentiation of Eq. [301 



(s(xi) • • • s{xn)}d = ^ — ^ — : 

Z dJ(Xl) ■ ■ ■ dJ(Xn) 



(31) 



Of special importance are the so-called connected corre- 
lation functions or cumulants 



{s{xi) ■ ■ ■ s{Xn))d 



log Zd[J] 



SJ{xi) ■ ■ -SJixn) 



(32) 



j=o 



which are corrected for the contribution of lower mo- 
ments to a correlator of order n. For example, the con- 
nected mean and dispersion are expressed in terms of 
their unconnected counterparts as: 

{s{xm = {^<^))d, 

{s{x)s{ym = {s{x)s{y))d-{s{x))a{s{y))d, (33) 

where the last term represents such a correction. For 
Gaussian random fields all higher order connected corre- 
lators vanish: 



{s{xi) ■ ■ ■ s{Xn))d ^ 



(34) 



for n > 2. For non-Gaussian random fields, they are 
in general non-zero, and for later usage we provide the 
connected three- and four-point functions, 

(SxSySz) d — {{Sx~ Sx){Sy— Sy){Sz— Sz))rfj 

{SxSySzSu) fil — ii^x ^x)i^y ^y){^z ^z){^u ^u))d 
" {^xSy)'^{SzSu)'d — (Sx SzYd{^y ^uYd 
— i^x Su)d{Sy Sz)d: (35) 

where we used Sx = s{x) and defined Sx = {s{x))d- 

The assumption that the Hamiltonian can be Taylor- 
Frechet expanded in the signal field permits to write 



1 



oo 



n— 3 



(36) 

Repeated coordinates are thought to be integrated over. 
The first three Taylor coefficients have special roles. The 
constant Hq is fixed by the normalization condition of the 
joint probability density of signal and data. If H'^[s] de- 
notes some unnormalised Hamiltonian, its normalization 
constant is given by 



Ho = log Vs Vd 



-H'J 



(37) 



j=o 



Often Ho is irrelevant unless different models or hyper- 
parameters are to be compared. 

We call the linear coefficient j information source. This 
term is usually directly and linearly related to the data. 
The quadratic coefficient, D~^, defines the information 
propagator D{x,y), which propagates information on the 
signal at y to location x, and thereby permits, e.g., to par- 
tially reconstruct the signal at locations where no data 
was taken. Finally, the anharmonic tensors create 
interactions between the modes of the free, harmonic the- 
ory. Since this free theory will be the basis for the full 
interaction theory, we first investigate the case 
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B. Free theory 

1. Gaussian data model 

For our simplest data model we assume a Gaussian 
signal with prior 



r exp ( — —s^S 



(38) 



where S — {s s^) is the signal covariance. The signal is 
assumed here to be processed by nature and our mea- 
surement device according to a linear data model 



d ~ Rs + n. 



(39) 



Here, the response R[s] = Rs is linear in and the noise 
Us = nis independent of the signal s. The linear response 
matrix R of our instrument can contain window and se- 
lection functions, blurring effects, and even a Fourier- 
transformation of the signal space, if our instrument is 
an interferometer. Typically, the data-space is discrete, 
whereas the signal space may be continuous. In that case 
the z-th data point is given by 



dx Ri{x) s{x) + rii 



(40) 



We assume, for the moment, but not in general, the 
noise to be signal-independent and Gaussian, and there- 
fore distributed as 



P{n\s)^g{n,N), 



(41) 



where N = (nn^) is the noise covariance matrix. Since 
the noise is just the difference of the data to the signal- 
response, n = d ^ R s, the likelihood of the data is given 
by 

P{d\s) = P{n = d-Rs\s) = g{d-Rs,N), (42) 

and thus the Hamiltonian of the Gaussian theory is 

Hg[s] - ~\og[P{d\s)Pis)] 

= --\og[g{d-Rs,N)gis,S)] 



Here 



D = [S-^ + R'^N-^R] ^ 



(43) 



(44) 



is the propagator of the free theory. The information 
source. 



= R^N-'^d, 



(45) 



depends linearly on the data in a response-over-noise 
weighted fashion and reads 



] 



(:r)-^i?*(a;)^i7'^^- 



(46) 



in case of discrete data but continuous signal spaces. Fi- 
nally, 

^0 ^^d^ N-U+^log{\2 7TS\ \27tN\) (47) 

has absorbed all s-independent normalization constants. 
The partition function of the free field theory. 



Zg[J] = j Vse-"''^'^+^^' (48) 

is a Gaussian path integral, which can be calculated ex- 
actly, yielding 

Zg[J] - v^2^exp| + i(J + jf D(J + J)-^o^}■ 
(49) 

The explicit partition function permits to calculate via 
Eq. [31] the expectation of the signal given the data, in 
the following called the map generated by the data 
d: 



md 



S log Zg 



SJ 



(50) 



j=o 



= [S-^ +R^N-^R] ^R^N-^. 



The last expression shows that the map is given by 
the data after applying a generalized Wiener filter, 
'md — F^^F d. The propagator D{x,y) describes how 
the information on the density field contained in the 
data at location x propagates to position y: m[y) — 
f dxD{y,x)j{x). 

The connected autocorrelation of the signal given the 
data, 



d = D= [S-^ + R'' N-^ R] \ 



(51) 



is the propagator itself. All higher connected correlation 
functions are zero. Therefore, the signal given the data is 
a Gaussian random field around the mean rud and with 
a variance of the residual error 



md 



(52) 



provided by the propagator itself, as a straightforward 
calculation shows: 



{Tr^)d^{ss^)d~{s)d{s^)d^{ss^Yd^D. 



(53) 



Thus, the posterior should be simply a Gaussian given 

by 



P{s\d)^g{s-md,D) 



(54) 
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As a test for the latter equation, we calculate the evidence 
of the free theory via 



Pid\s)P{s) g{d-Rs,N)g{s,S) 



P{s\d) 



g{s-Dj,D) 



™^)%xp{i(,t..,-.t^-.)K55) 



which is indeed independent of s and also identical to 
Zg[0], as it should be. 



2. Free classical theory 

The Hamiltonian permits to ask for classical equations 
derived from an extremal principle. This is justified, on 
the one hand, as being just the result of a the saddle- 
point approximation of the exponential in the partition 
function. On the other hand, the extrema principle is 
equivalent to the maximum a posteriori (MAP) estima- 
tor, which is quite commonly used for the construction 
of signal- filters. An exhaustive introduction into and dis- 
cussion of the MAP approximation to Gaussian and non- 
Gaussian signal fields is provided by Lemm ^48J . 

The classical theory is expected to capture essential 
features of the field theory. However, if the field fluctua- 
tions are able to probe phase space regions away from the 
maximum in which the Hamiltonian (or posterior) has a 
more complex structure, deviations between classical and 
field theory should become apparent. 

Extremizing the Hamiltonian of the free theory (Eq. 



SHr 



Ss 



D- 



(56) 



we get the classical mapping equation m = Dj, which is 
identical to the field theoretical result (Eq. [50]) . 

It is also possible to measure the sharpness of the max- 
imum of the posterior by calculating the Hessian curva- 
ture matrix 



Hg [to] 



H[s] 



D- 



(57) 



In the Gaussian approximation of the maximum of the 
posterior, the inverse of the Hessian is identical to the 
covariance of the residual 



(58) 



which for the pure Gaussian model is of course identical 
to the exact result, as given by the field theory (Eq. 



IV. INTERACTING INFORMATION FIELDS 



Interaction Hamiltonian 



1. General Form 

All results of the free theory presented so far are well- 
known within the field of signal reconstruction. IFT re- 
produces them elegantly, and is therefore of pedagogical 
value. However, the new results presented in the rest 
of this paper arise as soon as one leaves the free theory. 
Non-Gaussian signal or noise, a non-linear response, or 
a signal dependent noise create anharmonic terms in the 
Hamiltonian. These describe interactions between the 
eigenmodes of the free Hamiltonian. 

We assume the Hamiltonian can be Taylor expanded 
in the signal fields, which permits to write 



1 

H[s] = - ,stZ?-i . - jt,s + Hl + Y,-i Ai'^)..,„ 



ffi„t[s] 



(59) 

Repeated coordinates are thought to be integrated over. 
In contrast to Eq. [36] we have now included perturba- 
tions which are constant, linear and quadratic in the sig- 
nal field, because we are summing from n = 0. This 
permits to treat certain non-ideal effects perturbatively. 
For example if a mostly position-independent propagator 
gets a small position dependent contamination, it might 
be more convenient to treat the latter perturbatively and 
not to include it into the propagator used in the calcula- 
tion. Note further, that all coefficients can be assumed to 
be symmetric with respect to their coordinate-indices.^ 

Often, it is more convenient to work with a shifted 
field (/) = s — t, where some (e.g. background) field t is 
removed from s. The Hamiltonian of d reads 



This means Dx 



„ and h-^x}^^...x^^^^ = Ai"^.j;„ with tt 
any permutation of {1, . . . ,n}, since even non-symmetric coef- 
ficients would automatically be symmetrized by the integration 
over all repeated coordinates. Therefore, we assume in the fol- 
lowing that such a symmetrization operation has been already 
done, or we impose it by hand before we continue with any per- 
turbative calculation by applying 



A 



(n) 



E 



This clearly leaves any symmetric tensor invariant if Vn is the 
space of all permutations of {1, . . . , n}. 
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n=0 



Y] -T ^''xl'...Xr. 0X1 • 0X„ , with 



H' 



(60) 



/ = ]-D-H, and 



A' 



(m) 



oo -J 

a(™+") / 



ri=0 



2. Feynman rules 

Since all the information on any correlation functions 
of the fields is contained in the partition sum and can be 
extracted from it, only the latter needs to be calculated: 



Z[J] 



Vse 



-H[s] + J^B 



f °° 1 



= exp 
X Vse 



SJx 



SJx„ 



4. Vertices with n legs represent the term — A^?'' ^, , 
where each individual leg is labeled by one of the 
internal coordinates x'l, . . . , x'^. This more com- 
plex vertex-structure, as compared to QFT, is a 
consequence of non- locality in IFT. 

5. All internal (and therefore repeatedly occurring) 
coordinates are integrated over, whereas external 
coordinates are not. 

6. Every diagram is divided by its symmetry factor, 
the number of permutations of vertex legs leaving 
the topology invariant, as described in any book on 
field theory [e.g. [13] ■ 

The n-th moment of s is generated by taking the n-th 
derivative of logZ[J] with respect to J, and then set- 
ting J = 0. This correspond to removing n end- vertices 
from all diagrams. For example, the first four diagrams 
contributing to a map (m = {s)(^s\d)) sue 



D j — Dxy jy 

dyD{x,y)j{y), 



exp 



Zg[J]. 



There exist well known diagrammatic expansion tech- 
niques for such expressions [e.g.[53| ■ The expansion terms 
of the logarithm of the partition sum, from which any 
connected moments can be calculated, are represented by 

all possible connected diagrams build out of lines ( ), 

vertices (with a number of legs connecting to lines, like 

— •, — •— , )"(, ...) and without any external line- 

ends (any line ends in a vertex). These diagrams are 
interpreted according to the following Feynman rules: 

1. Open ends of lines in diagrams correspond to ex- 
ternal coordinates and are labeled by such. Since 
the partition sum in particular does not depend 
on any external coordinate, it is calculated only 
from summing up closed diagrams. However, 
the field expectation value m(x) = (s(a;))(s|(;) = 
d\ogZ[J]/dJ{x)\j=o and higher order correlation 
functions depend on coordinates and therefore are 
calculated from diagrams with one or more open 
ends, respectively. 

2. A line with coordinates x' and y' at its end repre- 
sents the propagator D^i yi connecting these loca- 
tions. 

3. Vertices with one leg get an individual internal, 
integrated coordinate x' and represent the term 



jx' ~^ Jx 



■A 



(1) 



-ii?A(3)[,D] 



-D A(3' D 

2^xy ^^yzu ^zu 



(61) 



-\ JdyD^y jdz JduAi^^^D.u, 

^2^^y ^yuz Dzz' jz' Duu' ju' 

-\j dyD^y Jdz J duK^X 



(62) 



dz' D 
1 



du' Duu' ill', and 



D A*"') D D , i , 

^xy ^'■yzuv ^zu J-fyv' Jv' 



= ^^J d-yDxy Jdz Jdu J dvA'y^JuvDzu 

X J dv' Dyy> ju'. 

Here we have assumed that any first and second order 
perturbation was absorbed into the data source and the 
propagator, thus A*^^^ = A^^^ = 0. Repeated indices are 
assumed to be integrated (or summed) over. 

3. Local interactions and Fourier space rules 

In case of purely local interactions 

Ai^..x„ = A„(a;i) 6{xi - X2) ■ ■ ■ (5(xi - a;„) (63) 
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the interaction Hamiltonian reads 

oo ^ 



(64) 



and the expressions of the Feynman diagrams simphfy 
considerably. The fourth Feynman rule can be replaced 

by 

4. Vertices with n lines connected to it are associated 
with a single internal coordinate x' and represent 
the term — A„(a;'). 

For example, the last loop diagram in Eg [62l becomes 



i jdyD^y \i{y)Dyy j dZ Dy^ 



Jz 



(65) 



In case of local interactions, it can be helpful to do 
the calculations in Fourier space, for which the Feynman 
rules can be obtained by inserting a real-space identity 
operator 1 = i^^i^ in between any scalar product and 
assigning the inverse Fourier transformation F'^ to the 
left and the forward transform F to the right term, e.g. 

Dj ^F^FD^ Fj =F^D'j'. 
D' ^ 

This yields: 

1. An open end of a line has an external momentum 
coordinate fc, and gets an / dfce~''^'^/(27r)" apphed 
to it, if real space functions are to be evaluated. 

2. A line connecting momentum k with momentum k' 
corresponds to a directed propagator between these 
momenta: -Dfcfc' = D{k, k'). 

3. A data source vertex is {j + J — Xi){k"), where k" 
is the momentum at the data-end of the line. 

4. A vertex with m > 1 lines with momentum labels 
ki,...,km is -A„i(fco)(27r)"(5(X;"o ^i)- 

5. An internal end of a line has an internal (in- 
tegrated) momentum coordinate k'. Integration 
means a term Jdk'/{2TT)" in front of the expres- 
sion. 

6. The expression gets divided by the symmetry factor 
of its diagram. 

Here, j{k) = (F j){k) ^ J dx j{x) '''' , D{k,k') = 

{FDF''){k,k') ^ JdxJdx'D{x,x')e''^''''-''''''\etc. are 
the Fourier-transformed information source, propagator, 
etc., respectively 

Note, that momentum directions have to be taken into 
account. The momenta that go into a vertex, data source 
or open end get a positive sign in the delta-function of 
momentum conservation, the ones that go out of a vertex 
get a minus sign. 



4- Simplistic interaction Hamiltonians 

In order to have a toy case, which permits analytic 
calculations, we introduce a simplistic Hamiltonian by 
requiring the data model to be translational invariant 
and all interaction terms to be local. This is the case 
whenever the signal and noise covariances are fully char- 
acterized by power spectra over the same spatial space, 

S{k,q) = (27r)"(5(fc-<7)Ps(fc), (66) 
N{k,q) = (27r)"(5(fc-<7)Pjv(fc), (67) 

with Ps{k) = (|s(fc)|2}/F, and F„(fc) = {\n{k)\'') /V , 
where V is the volume of the system. We assume further 
that the signal processing can be completely described 
by a convolution with an instrumental beam. 



d{x)^ I dyR{x-y)s{y) + n{x), 



(68) 



where the response-convolution kernel has a Fourier 
power spectrum PR{k) = |i?(fc)p (no factor 1/V). In 
this case D can be fully described by a power spectrum: 



D{k,q) = {2TTY5{k-q)PD{k), 



(69) 



with Poik) = {Ps\k) + PR{k) Pj,'ik)y\ 

The locality of the interaction terms requires Am = 
const beside translational invariance and therefore the 
interaction Hamiltonian reads 

°° A r 

i^intW = V ^ dxs"\x) (70) 

^ — ' m! / 

m=l 

m=l \i=l'' ^ ' / j = l 

In that case, the Feynman rules simplify considerably. 
For the interaction Hamiltonian of Eq. [701 the Feynman 
rules are now: 

1. unintegrated x-coordinate: exp(— ifcx) (if real 
space functions are to be evaluated), 

2. propagator: Poik), 

3. data source vertex: {j + J — Ai)(fc), 

4. vertex with m > 1 lines: —Am, 

5. imply momentum conservation at each vertex: 
(27r)"5(^j^j^ ki)), and integrate over every internal 
momentum: / J0y^^ 

6. and divide by the symmetry factor. 

5. Feynman rules on the sphere 

For CMB reconstruction and analysis, but presumably 
also for terrestrial applications, the Feynman rules on 
the sphere Q, = are needed and therefore provided in 
Appendix |B] 
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B. Normalisability of the theory 

In contrast to QFT, IFT should be properly normal- 
ized and not necessarily require any renormalization pro- 
cedure. The reason is that IFT is not a low-energy limit 
of some unknown high-energy theory, but can be set up 
as the full (high-energy) theory. The Hamiltonian is just 
the logarithm of the joint probability function of data and 
signal, Ha[s] = — log [P{d, s)], and therefore well defined 
and properly normalized if the latter is. Only if ad-hoc 
Hamiltonians are set up, or if approximations lead to ill- 
normalized theories, normalization should be an issue. 

However, since we are trying a perturbative expansion 
of the theory, there is no guarantee that all individual 
terms are providing finite results. For example in QFT, 
simple loop diagrams are known to be divergent and re- 
quire renormalization. In the following we investigate a 
simplistic, but representative case of IFT, which shows 
that such problems are generally not to be expected. 

Let us adopt the simplistic situation described in 
IIV A 41 and estimate a simple loop diagram for which 
we assume for notational convenience A3 = —2 (27r)" A' 
(with A' > 0): 



= --DX3D 
2 ^ 



(71) 



X' dk dk' S{k + k' ^ k') Poik) Pnik') e 



'\ Akx 



< X'PoiO) J dk' Psik') = X' V PoiO) {s^{x)}, 

where V is the volume of the system. Here and in the 
following, C denotes the diagonal of the matrix C. 

Thus, as long the signal field is of bounded variance, 
the loop diagram is convergent due to Pd < Ps for all k. 
Even a signal of unbounded variance would not lead to a 
divergent loop diagram if J dk [Pn / PR){k) is finite, since 
we also have Pd < Pn/Pr- A bounded variance signal 
is very natural, especially in a cosmological setting.^ 

Finally, since a signal as an information field can be 
chosen freely, we can define it to be the filtered version of 
the physical field (e.g. dark matter distribution or CMB 
fluctuations), so that only modes of sufficiently bound 
variance are present in it. Since we have the freedom to 
chose information fields, which are mathematically well 
behaved, we can therefore ensure convergence of expres- 
sions. 

Although this is not a general proof of normalisability 
of the theory, which is beyond the scope of this paper, it 



The cosmological signal of primary interest, the initial den- 
sity fluctuations as revealed by the large-scale-structure and the 
CMB, is expected to exhibit a suppression of small-scale power 
due to the free-streaming of dark matter particles before they be- 
came non-relativistic. Also the CMB temperature fluctuations 
are damped on small scales, due to free streaming of photons 
around the time of recombination. 



should provide confidence in the well-behavedness of the 
formalism in sensible applications. The price to be payed 
for this well-behavedness is the more complex structure 
of the propagator, which, in comparison to QFT, even 
in simplistic cases can be non-analytical and require nu- 
merical evaluation. 



C. Expansion around the classical solution 

1. General case 

The classical solution of the Hamiltonian in Eq. [55] is 
provided by its minimum. 



5H_ 

5sx 



00 -J 

— ^xy°y Jx ^ / , ^1 ^^xxi...x^ ■ ■ ■ — LI- 

m— 1 

(72) 



This leads to the equation for the classical field 



\ rii — 1 



a^m a^i ■ ■ ■ Xm I ' 



(73) 



which one can try to solve iteratively. 



2. Local interactions 

For simplicity, we concentrate for a moment on the 
case of purely local interactions, for which the equation 
for the classical field Sd is 



/ °° V 
\ ^-^ m! 

\ m=l 



(74) 



Iterating this equation and rewriting the resulting terms 
as Feynman diagrams shows that the classical solution 
contains the tree-diagrams. The loop diagrams can be 
added by investigation of the non-classical uncertainty 
field (j) — s — Sci- 

A non-classical expansion of the information field 
around the classical field is possible by inserting s — 
Sci + into the Hamiltonian (Eq. [M]) . Reordering terms 
according to the powers of the field (p leads to its Hamil- 
tonian 



H'l 



with 



A' = 



^0 



1 °° 1 



E^'^S, (75) 



H[s,i]=Ho + -sjD-\s,i + X'o, 

j - X[ - D-^ Sci, and D' = (D^^ + %)-^ . 
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In case Sci is exactly the classical solution, Eqs. [74] and 
1751 imply that j' = 0. Thus, there are no one-line inter- 
nal vertices in any Feynman-graphs of the (/i-theory, and 
only loop-diagrams contribute uncertainty-corrections^ 
to any information theoretical estimator. For example, 
the uncertainty-corrections to the classical map estima- 
tor are given by 



md - Sci 



(76) 




'+00+ 



However, in case Sci is not (exactly) the classical solution, 
may this due to a truncation error of an iteration scheme 
to solve for the classical field, or may Sci be chosen for a 
completely different purpose, Eq. [75]provides the correct 
field theory for (j) = s — Sc\ independent of the nature of 
Sc\- In case of a truncation error, incorporating diagrams 
with data-source terms j' into any computation will per- 
mit to correct the inaccuracy of Sd in a systematic way. 



D. Boltzmann-Shannon Information 

1. HelmhoHz free energy 

Information fields carry information on distributed 
physical quantities. The amount of signal-information 
should be measurable in information units like bits and 
bytes. This is possible by adopting the Boltzmann- 
Shannon information measure of negative entropy. The 
entropy of a signal probability function measures the 
phase-space volume available for signal uncertainties, and 
therefore the constraintness of the remaining uncertain- 
ties. Thus we define 

Id = jvsP{s\d) \ogP{s\d) 

= - j'Ds^ e"^'"' {H[s\ + log Z) 

= - {H[s]) a- log Z. (77) 

as the information measure. Introducing 

Zfj[d,J] ^ Jvs exp{-P{H[s]- Jh)}, and 



F^id, J] 



\ogZp[d, J], 



(78) 



We propose the term uncertainty -corrections in order to describe 
the influence of the spread of the probabihty distribution func- 
tion around its maximum. The uncertainty-corrections are the 
information field theoretical equivalent to quantum-corrections 
in quantum field theories. 



of which the latter is the Helmholtz free energy as a func- 
tion of the inverse temperature /3, we can write 



Id 



logZi[d,0]-(i/[s])„ 



dFp[d, J] 



dp 



/3=1, J=0 

(79) 

as can be verified by a direct calculation. The first ex- 
pression for Id in Eq. [79] is equivalent to the well known 
thermodynamic relation F — E — T Sb with the internal 
energy E = {H[s])d, the Boltzmann entropy Sb = —Id 
and the temperature, which is set here to T = 1. The sec- 
ond expression actually holds even if the Hamiltonian is 
improperly normalized, e.g. Hq can be chosen arbitrarily 
if Zj3[d, J] is calculated consistently with this choice. 

The Helmholtz free energy FplJ] is also the genera- 
tor of all connected correlation functions of the signal 
{sxi ■ ■ ■ Sx„)'is\d^^ ^~5'^Fp[d,J]/5Jx^ - ■ ■ 5J^Jl3^i^j=o. It 
can be calculated as follows: 



Ff, 



1 , (z$[J] 



1 

l3 



-- logZ|[J]-- log 



1 



(80) 



where the average in the last term is over the Gaus- 
sian probability function Pj^[s] = exp(— /3 (TJg [s] — 
J'^ s)) / Z^[J]. This term can be calculated by using the 
well-known fact that the logarithm of the sum of all pos- 
sible connected and unconnected diagrams with only in- 
ternal coordinates (or without free ends), as generated 
by the exponential function of the interaction terms, is 
given by the sum of all connected diagrams [s^l- For 
example, a free theory, perturbed by small, up-to-fourth- 
order interaction terms (all being proportional to some 
small parameter 7), has 



Ho 



•OO+X). 



Oil'), (81) 



where an information source vertex reads /? {J+j — A^^^^), 
an internal vertex with n lines /? A*^") , and the propagator 
(3^^ D. Finally, we have defined 

= i \og\2^Dp-^\ = iTr(log(27ri?r')). 
Thus, we have 
Fp[J] = i/o--LTr(log(2^7^r'))-H^A(2)[i^] 

+ \{J + ]- A^^^)Hd + A(2)) (J + J - aW) 
+ ^A(3)[£),mj] -f ^A(3)[toj,toj,toj] 
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+ ^A(^)[D,D] + ^A^^^[D,mj,mj] 

+ ^A^^^[mj,mj,mj,mj]+0{-f^), (82) 

where we introduced the zero-order map mj = D (J + j) 
for notational convenience. The power of /? associated 
with the different diagrams in Eq. [81] is given by the 
number of vertices minus the number of propagators mi- 
nus one. Thus, all tree-diagrams are of order the 
one-loop diagrams are of order and the two loop di- 
agram of order /3~^, and only the latter two affect the 
information content: 



Id = - 

_ 1 
~ 2 



• O+O- 



-Tr(l + log(27r D)) + A^^) [D] + A'^^ [D, mo] 



IaW [D®{D + mo ml)] 



Oh'), 



(83) 



where g = Tr(l), P — 1, J — 0, and thus mo = Dj.^ 

2. Free theory 

To obtain the information content of the free theory, 
we can set 7 = in Eqs. [82] and [83| or use Eq. [49] with 
the replacements J — > /3 J, j — > D — > P^^D, and 
Hq (3 Hq. In both cases we find identically 

F^[J] =i/e -\{J + j)^D (J + j) - ^Tr log (^^Dj , and 

/d = -iTr(l + log(2 7ri5)). (84) 

Very similarly, one can calculate the information prior to 
the data, which turns out to be 



/o = --Tr(H-log(2 7r^)). (85) 

Thus, the data-induced information gain is 

A/rf = Id -Io^^Tt {log {SD-')) 

= ^Ti {log {I + SR^N-^R)) . (86) 

The information gain depends on the signal-response- 
to-noise ratio Q = RS , also shortly denoted by 



^ Here, we introduced the symmetrized tensor product A B of an 
n-rank tensor A and an m-rank tensor B, which has the property 

with Vi being the set of permutations of {1, ...,/}. 



the measurement fidelity or quality. The information in- 
creases linearly with Q as long as Q ^ 1, but levels off 
to a logarithmic increase for Q ^ 1. 

We note, that for the free theory only the information 
gain does not depend on the actual data realization. 



E. IFT Recipe 

A typical IFT application will aim at calculating a 
model evidence P{d), the expectation value of a signal 
given the data, the map m{x) = (s(a;))(s|rf) of the sig- 
nal, or its variance (j1{x,y) = {{s{x) — m{x)){s{y) — 
TO(y)))(s|jj) as a measure of the signal uncertainty. The 
general recipe for such applications can be summarized 
as following: 

• Specify the signal s and its prior probability distri- 
bution P{s). If the signal is derived from a physical 
field ijj, of which a prior statistic is known, the dis- 
tribution of s = s[ip] is induced according to Eq. 

m 

• Specify the data model in terms of a likelihood 
P{d\s) conditioned on s. Again, if the data are 
related to an underlying physical field ip, the like- 
lihood is given by Eq. [H 

• Calculate the Hamiltonian Hd[s] = —log{P{d,s)), 
where P{d,s) — P{d\s) P{s) is the joint probabil- 
ity, and expand it in a Taylor- Frechet series for all 
degrees of freedom of s. Identify the coefficients of 
the constant, linear, quadratic, and n"^-order terms 
with the normalization Hq, information source j, 
inverse propagator D~^, and n'^-order interaction 
term A'^"\ respectively, as shown in Eq. [36lor[59l 

• Draw all diagrams, which contribute to the quan- 
tity of interest, consisting of vertices, lines, and 
open-ends up to some order in complexity or some 
small ordering parameter. The log-evidence is 
given by the sum of all connected diagrams without 
open ends, the expectation value of the signal by 
all connected diagrams with one open end, and the 
signal-variance around this mean by all connected 
diagrams with two open ends. 

• Read the diagrams as computational algorithms 
specified by the Feynman rules in Sect. IIVI and 
implement them by using linear algebra packages 
or existing map-making codes for the information 
propagator and vertices. The required discretisa- 
tion is outlined in Sect. Ill E II Information on how 
to implement the required matrix inversions effi- 
ciently can be found in the literature given in Sees. 
[rC2l[rC4l and[ICl|and especially in [H. 

• If the resulting non-linear data transformation (or 
filter) has the required accuracy, e.g. to be verified 
via Monte-Carlo simulations using signal and data 
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realizations drawn from the prior and likelihood, 
respectively, an IFT algorithm is established. 

• In case that too large interaction terms in the 
Hamiltonian prevent a finite number of diagrams to 
form a well performing algorithm, a re-summation 
of high order terms is due. This can be achieved by 
the saddle point approximation (classical solution, 
maximum a posteriori estimator) , or even better by 
a detailed renormalization-flow analysis along the 
lines outlined in Sect. IV Fl 



V. COSMIC LARGE-SCALE STRUCTURE VIA 
GALAXY SURVEYS 

A. Poissonian data model and Hamiltonian 

Many datasets suffer from Poisson noise, which is non- 
Gaussian and signal dependent, and therefore well suited 
to test IFT in the non-linear regime. For example, the 
cosmological LSS is traced by galaxies, which may be 
assumed to be generated by a Poisson process. On large- 
scales, the expectation value of the galaxy density fol- 
lows that of the underlying (dark) matter distribution. 
The aim of cosmography is to recover the initial den- 
sity field from the shot-noise contaminated galaxy data. 
Currently, large galaxy surveys arc conducted in order 
to chart the cosmic matter distribution in three dimen- 
sions. Improving the galaxy based LSS reconstruction 
techniques and understanding their uncertainties better 
is therefore an imminent and important goal. Optimal 
techniques to reconstruct Poissonian-noise affected sig- 
nals are also crucial for other problems, since e.g. imag- 
ing with photon detectors plays an important role in as- 
tronomy and other fields. Here, we outline how such 
problems can be treated, by discussing a specific data 
model motivated by the problem of large-scale-structure 
reconstruction from galaxies. For this problem we work 
out the optimal estimator and show its superiority nu- 
merically. A more general discussions of models of galaxy 
and structure formation and references to relevant works 
was given in Sect. II C 41 

In order to treat the Poissonian case in a convenient 
fashion, we subdivide the physical space into small cells 
with volumes AV, and assume that a cell located at Xi 
has an expected number of observed galaxies 



(87) 



with K = hg AV being the cosmic average number of 
galaxies per cell and b being the bias of the galaxy over- 
density with respect to the dark matter overdensity s, 
still assumed to be a Gaussian random field (Eq. [55)1 . 
However, this data model has two shortcomings. First, 
too negative fluctuations of the Gaussian random field 
with s < —1 lead to negative expectation values, for 
which the Poissonian statistics is not defined. Second, 
the mean density of observable galaxies n and their bias 



parameter b are constant everywhere, whereas in reality 
both exhibit spatial variations. Although being now 
spatially inhomogeneous, we assume k and b to be known 
for the moment and to incorporate all above observa- 
tional effects. 

To cure the above mentioned shortcomings we replace 
Eq. [57] by a non-linear and non-translational invariant 
model: 



K{xi) e-Kp(b{xi) s{xi)), 



where k and b may depend on position in a known way, 
and the unknown Gaussian field s, the log-matter density, 
may exhibit unrestricted negative fluctuations. Note that 
fi is the signal response, by our definition in Eq. 1101 
since /i[s] = ((i)((i|s)- We call k the zero-response, since 
/Lt[0] = K. It should be stressed that the data model in Eq. 
I88l is just a convenient choice for illustration and proof-of- 
concept purposes, and is easily exchangeable with more 
realistic, and even non-local data models. However, this 
log-normal data model was originally proposed by Coles 
and Jonesj212], investigated for constrained realizations 
by Sheth To7] and Vio et al. '213] and seems to reproduce 
the statistics of LSS simulations much better than the 



often used normal distribution of the overdensity [214 1 . 

Having chosen a Poissonian process to populate the 
Universe and our observational data with galaxies ac- 
cording to the underlying log-density field s, the likeli- 
hood is 



(89) 



= exp 1^ [d, log^i - fl, - log(di!)]| , 

where di is the actual number of galaxies observed in cell 
i. Since P{s) = Q{s,S), the Hamiltonian is given by 

Hd[s] = - log P{d,s)^^ log P{d\s)- log P{s) 
= -d^bs + KUxp{bs) + Hl) + ^s^S-h 

oo 

= - s^D-^s - jU + Ho + ^-Xi s", with 



n— 3 



:b\ 



(90) 



Such variations are due to the geometry of the observational 
survey sky coverage, due to a galaxy selection function which 
decreases with distance from the observer, and due to a chang- 
ing composition of the galaxy population. The latter distance- 
effects are caused by the cosmic evolution of galaxies and by the 
changing observational detectability of the different types with 
distance. We note, that an observed sample of galaxies, which 
was selected deterministically or stochastically from a complete 
sample e.g. by their luminosity due to instrumental sensitivity, 
still possesses a Poissonian statistics, if the original distribution 
does. 
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j = b{d- k), 

Ho = ^l0gi\27TS\) + {K + l0g{dl))h-dh0gK, 

and 

A„ = Kb". 

The hat on a scalar field denotes that it should be read 
as a matrix, which is diagonal in position space (see Ap- 
pendix A few remarks should be in order. Compar- 
ing the propagator to the one of our Gaussian theory one 
can read off an inverse noise term M = H^N'^R = nb"^. 
Thus the effective (inversely response weighted) noise de- 
creases with increasing mean galaxy number and bias, 
and seems to be infinite in regions without data (k — 0) 
without causing any problem for the formalism. 

The information source j increases with increasing re- 
sponse (bias) of the data (galaxies) to the signal (density 
fluctuations). However, it certainly vanishes for zero re- 
sponse {b = 0) or in case that the observed galaxy counts 
match the expected mean at a given location exactly. Fi- 
nally, the interaction terms A„ are local in position space, 
and vanish with decreasing b and k. The latter param- 
eter is under the control of the data analyst, since it is 
proportional to the volume of the individual pixel sizes, 
and therefore can be made arbitrarily small by choosing 
a more fine grained resolution in signal space. However, 
this would not change the convergence properties of the 
series since any interaction vertex has then to be summed 
over a correspondingly larger number of pixels within a 
coherence patch of the signal, which exactly compensates 
for the smaller coefficient.^^ The bias, in contrast, is 
set by nature and can be regarded as a power counting 
parameter, which provides naturally a numerical hierar- 
chy among the higher order vertices and diagrams for 
9S < 1. Note that j = 0(6). 



B. Galaxy types and bias variations 

Real galaxies can be cast into different classes, which 
all differ in terms of their luminosities, bias factors, and 
the frequencies with which they are found in the Uni- 
verse. Although we are not going to investigate this 
complication in the following, it should be explained here 
how all the formulae in this section can easily be reinter- 
preted, in order to incorporate also the different classes 
of galaxies. 

The galaxies can be characterized by a type-variable 
L G iltypo, which may be the intrinsic luminosity, the 
morphological galaxy type, or a multi-dimensional com- 
bination of all properties which determine the galaxy 



type's spatial distributions via a L-dependent bias &l, 
and their detectability as encoded in ■ The data space 
is now spanned by fidata = ^^space x f^typc, and also /z, k 
and b can be regarded as functions over this space. 

Performing the same algebra as in the previous section, 
just taking the larger data-space into account, we get 
to exactly the same Hamiltonian, as in Eq. 1901 if we 
interpret any term containing d, k and b to be summed 
or integrated over the type variable L. Thus, we read 

j{x)^{b{d - k)) (x) = JdLbL{x) {dL{x) - kl{x)), 
D-y' = (5-1 + = S-^ + UyjdL KL{x)bl{x), 

K{x)^{Kb'^){x)=jdLKL{x)bl{x), and (91) 
^l[s\{x) = {Ke''') {x) EE j dL kl{x) e^^''--'^ "''■■'^ = JdL ^iL[s]{x), 

which all live in r^spacc solely, so that the computa- 
tional complexity of the matter distribution reconstruc- 
tion problem is not affected at all, and only a bit more 
book-keeping is required in its setup. 

A few observations should be in order. In case of all 
galaxies having the same bias factor, Eq. [91] is simply 
a marginalization of the type variable L, and any dif- 
ferentiation of the various galaxy types is not necessary. 
Since all known galaxy types seem to have b 0{1), 
such a marginalization seems to be justified, and ex- 
plains why LSS reconstructions, which applied this sim- 
plification, are relatively successful, although the differ- 
ent galaxy masses, luminosities, and frequencies vary by 
orders of magnitude. As our numerical experiments be- 
low reveal, the data, and therefore the reconstructability 
of the density field, exhibit a sensitive dependence on 
the bias for s-fiuctuations with unity variance. Such a 
variance is indeed observed on scales below 10 Mpc in 
the galaxy distribution, and therefore the galaxy type- 
dependent bias variation does indeed matter. Larger 
galaxies, which have larger biases, therefore provide per 
galaxy a slightly larger information source (j oc b), 
less shot noise {R^N~^R oc 6^), and increasingly larger 
higher-order interaction terms (A„ oc 6") in comparison 
to smaller galaxies. However, smaller galaxies are much 
more numerous by orders of magnitude, and therefore 
provide the largest total contribution to the information 
source, noise reduction and most low-order interaction 
terms. Thus, the latter will dominate and therefore per- 
mit a reasonable accurate matter reconstruction from an 
inhomogeneous galaxy survey using a single bias value. 
Nevertheless, improvements of the bias treatment are 
possible by applying the recipes described here. 



K seems to control the stiffness of the later introduced response 
renormalization flow equation and its values is therefore numeri- 
cally relevant. A lower k, due to a finer space pixelisation, results 
in a less stiff and better behaved equation. 



This is found for our specific data model fi oc exp(f)s), however, 
should also apply for other models, which somehow have to keep 
fj. > even for bs < —1 
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C. Non-linear map making 

The map, the expectation of our information field s 
given the data, is to the lowest order in interaction 



mi = 



0(6^) 



— jy 2 ^yy 2 ^-^y^ ) 

nyDyyDy,.3, + 0[b^) 

or in compact notation 

mi=mo-]^D}fi~K(^ + ml + Dbm^ +0{b^). (93) 

It is apparent, that the non-linear map making formula 
contains corrections to the linear map mo = Dj. The 
first two correction terms are always negative, reflect- 
ing the fact that our non-linear data model has non- 
symmetric fluctuations in the data with respect to the 
mean. The last correction term is oppositely directed to 
the linear map, thereby correcting for the curvature in 
the signal response. 

A one-dimensional, numerical example is displayed 
in Fig. [TJ There, the signal was realized to have a 
power spectrum Ps{k) oc {k^ + q'^)~^, with a correlation 
length = 0.04. The normalization was chosen such 
that the auto-correlation function is {s{x) s{x + ?'))(s) = 
exp(— Igrj) and therefore the signal dispersion is unity, 
(s^)(s) = 1- The data are generated by a Poissonian 
process from ^ k exp(6s) with b — 0.5. All three dis- 
played reconstructions exhibit less power than the orig- 
inal signal, as it is expected since the reconstruction is 
conservative, and therefore biased towards zero. 

The non-linear correction to the naive map mo should 
not be too large, otherwise higher order diagrams have to 
be included. In the case displayed in Fig. (TJ 6 = 0.5 en- 
sured that the linear corrections were mostly going into 
the right direction. However, in case 6 f» 1 there is no 
obvious ordering of the importance of the different inter- 
action vertices, and numerical experiments reveal that 
the first order corrections strongly overcorrect the linear 
map mo = D j. In such a case interaction re-summation 
techniques should be used to incorporate as many higher 
order interaction terms as possible. One very powerful 
re-summation is provided by the classical solution, as de- 
veloped below, which contains all tree-diagrams simulta- 
neously. This solution, also show in Fig. [U is very close 
to mi in this case. 
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D. Classical solution 

The classical signal field or MAP solution is given by 
Eq. [Til which reads in this case 
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The last expression motivates to introduce the expected 
number of galaxies given the signal s: 

K,=Ke'". (95) 

Also alternative forms of the MAP equation can be de- 
rived, for example one, which is especially suitable for 
large j: 
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(96) 

(i) 

This may be solved iteratively, while ensuring that s'^{ < 
S j at all iterations i with equality only where n = 0. This 
form of the classical field equation has some similarities 
to the naive inversion of the response formula, ('i}(d|s) = 
K exp(6s), which yields 



I log 
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a formula one can only dare to use in regimes of large d. 
Since Snaivc contains the full noise of the data, a suitable 
naive map may be given by mnaive = S Snaivo, after some 
fix for the locations without galaxy counts. The clas- 
sical solution, however, is more conservative than this 
naive data inversion, in that there is a damping term, 
'S'~^Sci/(k &), compensating a bit the influence of too 
large data points. 

Those equations permit to calculate the classical solu- 
tion if suitable numerical regularization schemes are ap- 
plied, since naive iterations can easily lead to numerical 
divergences in the non-linear case. 

One way of doing this is by turning the classical equa- 
tion (Eq. I94p into a dynamical system. Its initial con- 
ditions are given by a well solvable linear or even triv- 
ial problem to which non-linear complications are added 
successively during an interval of some pseudo-time. The 
endpoint of this dynamics is then the required solution. 
The meaning of the pseudo-time depends on the way it 
was set up. In any case, it can just be regarded as a math- 
ematical trick to generate a differential equation, which 
might be easier to solve numerically than the original 
problem. 

For example, a pseudo-time r can be introduced by 
setting j(t) = rj. Thus, the information source is 
successively injected into an initially trivial field state, 
Sci(O) = 0. This allows to set up a differential equation 
for Sci(t) by taking the time derivative of Eq. [M] 
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which has to be solved for Sci(l) starting from Sci(O) — 
0. This equation is very appealing, since it looks like 
Wiener-filtering an incoming information stream j and 
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FIG. 1: Poissonian-reconstruction of a signal with unit variance and correlation length = 0.04, observed with slightly 
non-linear response {b = 0.5, resolution: 513 pixels per unit length, zero-signal galaxy density: 1000 galaxies per unit length). 
Top: data d, signal response fi, and zero-response k. Middle: signal s, linear Wiener-filter reconstruction mo — Dj, its 
one-sigma error interval mo ± ]D^^^ , next order reconstruction mi according to Eq. 1921 and classical solution Sci according to 
Eq. 1941 Although the linear Wiener is reconstructing well at most locations, the nonlinear response requires the perturbative 
corrections present in mi or the classical solution in regions of high signal strength. Bottom: The residuals, the deviations of 
mo, mi, Sci from the signal, and the Wiener-variance ikh^^^ . 



accumulating the filtered data, wiiile simultaneously tun- 
ing tlie filter -Dsci(t) to the accumulated knowledge on 
the signal Sci{t) and thereby implied Poissonian- noise 
structure. Thus, it is a nice example system for contin- 
uous Bayesian learning and also illustrates how different 
datasets can successively be fused into a single knowledge 
basis. 

Map-making algorithms with a higher fidelityare pos- 
sible by not only investigating the maximum of the pos- 
terior, but by averaging the signal s over the full support 
of P{s\d). Anyhow, we can assume that a good approx- 



imation t « Sci to the classical solution can be achieved. 
Figs. [T] and [5] display classical solutions for slightly and 
strongly non-linear Poissonian inference problems. Espe- 
cially the second example shows that the classical solu- 
tion can be improved in regions of large uncertainty (see 
region between x = 0.2 and 0.5 in Fig. [21 where ap- 
parently better estimators exist) for missing uncertainty 
loop diagrams, which contain information about the non- 
Gaussian structure of the posterior P{s\d) away from Sc\. 
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FIG. 2: Poissonian-reconstruction of the same signal realization as in Fig. [l](unit variance and correlation length = 0.04), 
observed now with a strongly non-linear response (b = 2.5, resolution: 512 pixels per unit length, zero-signal galaxy density: 
100 galaxies per unit length where mask is one) through a complicated mask. Top: data d, signal response n, and zero-response 
K. Middle: signal s, classical solution Sd = mr^o, intermediate solution mT=o.5 and renormalization-based reconstruction 
m,T=i with uncertainty interval mr^i ± -Dy{fi, and mask ^/(ng AV). The linear Wiener-filter reconstruction mo as well as its 
next order corrected version mi are not displayed, since they are partly far outside the displayed area. Bottom: Deviations 
of the three reconstructions from the signal, and the original and the renormalized uncertainty estimates ±0^^^ and ±Z)y{fj, 
respectively. Note, that in the regions with many observed galaxies, the high signal to noise ratio can be seen in the narrowness 
of D}j!2iy which is significantly smaller than the data-unaffected Dq^^ at these locations. 



E. Uncertainty-loop corrections 

Now, we see how the missing uncertainty loop cor- 
rections can be added to the classical solution. These 
corrections can be derived from the Hamiltonian of the 
uncertainty-field (j) = s — t, 

HM = ^(b^D^'(b-jU + 4g{b<j,) + Ho.t, where 
= S-^+b^Kt, 



jt = b{d-Kt)-S-H, (99) 

m— 3 

and Ho t is a momentarily irrelevant normalization con- 
stant. Again, we have permitted for a non-zero jt, since 
t might not be exactly the classical solution. 

It is interesting to note that the interaction coefficients 
in this Hamiltonian, x["^^ = Ktb™, all reflect the expected 
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number of galaxies given the reference field t. Thus, the 
replacement kq — > would provide us with the shifted 
field Hamiltonian, as defined in Eq. [60l except for the 
term —S~^t in jt. It turns out, that this term is some 
sort of counter-term, which accumulates the effect of the 
non-linear interactions. 

We see that effective interaction terms arise when rele- 
vant parts of the solution are absorbed in the background 
field t. A similar approach is desirable for the loop di- 
agrams. Instead of drawing and calculating all possible 
loop diagrams, we want to absorb several of them simul- 
taneously into effective coefficients. For each vertex of 
the Poissonian Hamiltonian with m legs, there exist dia- 
grams in any Feynman-expansion, in which a number of 
n simple loops are added to this vertex. Such an rt-loop 
enhanced m— vertex is given by 



instead of its logarithm, s. Here c fixes the relation be- 
tween s and p, and go being the cosmic median dark 
matter density. Translating our log density map into the 
density results in the naive density estimator 



_ naive cm 

m„ = Qoe , 



(106) 



which is not optimal in the sense of minimal rms devia- 
tions. The proper estimator would be 

= (go e'^isld) = eo e""+^' (107) 

which contains uncertainty loop corrections accounting 
for the shift of the mean under the non-linear transfor- 
mation between log-density and density. 
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All these diagrams can be re-summed into an effective 
interaction vertex, via 



A 



(m) 



A 



,(m) 



n=0 



,'62 ^ 

Kt exp ( — D 



(m) 



(101) 



6™ = A^ 

t+^-D 



Thus, this re-summation is effectively equivalent to the 
replacement 



t+bD/2' 



(102) 



which reflects the larger expected response to a refer- 
ence field t due to the uncertainty fluctuations around it. 
Those fluctuations pick up the asymmetric shape of the 
exponential term in the Hamiltonian, where the larger re- 
sponse to positive fluctuations is not fully compensated 
by the lower response to negative fluctuations. One might 
wonder, if the simple replacement rule in Eq. 11021 could 
supplement the classical solution with the missing un- 
certainty loop corrections. Thus we ask, if the modified 
classical equation 



bS{d-K .,fi,„) 



(103) 



together with a self-constitently determined propagator 

could provide the mean field given the data. A more 
rigorous renormalization calculation will show that this 
is indeed the case, within some approximation. 

The loop-corrected density and propagator permit to 
construct estimators for the dark matter density itself. 



F. Response renormalization 

Since we are dealing with a 0°°-field theory, the zoo 
of loop diagrams is quite complex, and forms something 
like a Feynman foam. In order not to get stuck in the 
multitude of this foam, we urgently require a trick to keep 
either the maximal order of the diagrams low, or to limit 
the number of vertices per diagram, or both. We have 
basically two handles on any interaction term A„ — Kb"', 
the bias b and the zero-response k. We concentrate on 
the response, since it enters the Hamiltonian in a linear 
way and also the data can be regarded to be proportional 
to K. Thus, the full Hamiltonian 
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can be regarded to be proportional to the response, ex- 
cept for the prior term and also constant terms we im- 
mediately drop here and in the following. 

Let us assume that prior to any data analysis we have 
an initial guess mo for the signal with some Gaussian 
uncertainty characterized by the covariance Do- This 
can be expressed via a Hamiltonian of the form 



Hols] = lis 



mo 



)ti^-i(s-mo), 



(109) 



which defines a probability density via Pois) oc 
exp(— i/o[s]). In case the prior should be our initial guess, 
we have mo = and Do = S, but we need not restrict 
ourself to this case. Now, we want to anticipate step by 
step the information of the full problem, and forget our 
initial guess with the same rate. This can be modeled 
by adopting an affine parameter r, which measures how 
much we exposed ourself to the full problem. For each r, 
which we regard as a pseudo-time, our knowledge state is 
described by an Hamiltonian H^. Increasing r by some 
small amount e should therefore lead to the next knowl- 
edge state characterized by 
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Hr+e=Hr[s]+e iH[s] - Hr[s]) . 



(110) 
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FIG. 3: The original propagator Do — {S ^ + nob^) ^ (left) and the final of the renormalization flow D (Eq. 11171 right) in 
logarithmic grey scaling for the data displayed in Fig. [21 The values of the diagonals show the local uncertainty variance (in 

Gaussian approximation) before (-Do) and after (D) the data is analyzed, respectively. The bottom left and top right corners 
exhibit non-vanishing propagator values due to the assumed periodic spatial coordinate, which puts these corners close to the 
two others on the matrix diagonal. 



This equation just models an asymptotical approach to 
the correct Hamiltonian. If the initial guess was the prior, 
one sees that for infinitesimal steps e the knowledge flow 
corresponds to tuning up all terms proportional to k, 
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This motivates the term response renormalization for this 
kind of continuous learning system, into which the infor- 
mation source as well the interactions are fed with the 
same rate. 

The trick for the renormalization procedure is to ap- 
proximate the knowledge state at each moment r to be 
of Gaussian shape and therefore the Hamiltonian to be 
free (quadratic in the signal). Thus we set 
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where to,- and Dr = {S^^ + Mr)^^ are the mean and 
dispersion of the field given the acquired knowledge at 
time r, respectively. 

These have to be updated when the next learning step 
is to be performed. The next Hamiltonian, before it being 
again replaced by a free one, is 



Hr+e[<f>] 



n=l 



(112) 



if expressed for the momentarily uncertainty field (f> — 
s — m-r . Here, the perturbative expansion coefficients are 
given by 

= Km-r b + S~^mr — bd, 

A2 = Km^ b^ - Mt , and 
A,i = Km^ 6" for n> 2, 

assuming for simplicity that Mr is diagonal. This is a 
save restriction, since we will see that for t ~> 00 this is 
the case asymptotically, even for a non-diagonal initial 
Mq. Thus we can require that our initial guess was also 
of this form. 

In order to approximate this Hamiltonian by a free 
one, we have to calculate the shifted mean field and its 
connected two-point correlation function, the full prop- 
agator. To first order in e only leaf diagrams with a 
single perturbative interaction vertex contribute to the 
perturbed expectation value of 0: 




^eDr 



bd — S ^rri-r — b e 



(113) 



Note, that only odd interaction terms shift the expecta- 
tion value rur+e = 'nir + {(j}) ''(s\d) ■ even ones do not 
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exert any net forces in the vicinity of i^r = since they 
represent a potential which is mirror symmetric about 
this point. 

The renormalized propagator D-r+e is given by the 



{r+e) 



and 



connected two-point correlation function 
this is up to linear order in e 



'(s\d) - 



Rewriting this for an update of we find up to linear 
order in e 




Dr+eDr (m - Km, 6^' -^-Z^) (114) 
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Taking the limit e yields the integro-differential sys- 
tem 
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This converges at a fix point, which we previously guessed 
in Eqs. 11031 and 11041 for our uncertainty-loop enhanced 
classical equation. 

The classical and the renormalization flow fix point 
equations can be unified: 
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with T = and T = 1 for the classical and renormaliza- 
tion result, respectively. 

The parameter T is more than a pure convenience. If 
we would have introduced a temperature T at the be- 
ginning, via P{d,s\T) = exp{-Hd[s]/T), Eq. [113 would 
have been the result of the renormalization fiow calcula- 
tion. And the classical limit naturally corresponds to the 
zero temperature regime, in which the field expectation 
value is not affected by any uncertainty fiuctuations since 
the system is at its absolute energy minimum. 

An example of such reconstructions can be seen in Fig. 
m and its uncertainty structures in Fig. [3l Here, the 
renormalization equation indeed seems to provide a bet- 
ter result compared to the classical one. However, a sta- 
tistical comparison of the two reconstructions using 1000 
realization of the signal and data in Fig. |4] shows that 
there is at most a marginal difference. This may be sur- 
prising, since the classical and renormalization solution 
are quite distinct, and the latter is always lower than the 
former. One might therefore ask, if the two are bracket- 
ing the correct solution. And indeed, intermediate solu- 
tions constructed using T = 1/2 perform better than the 
ones for T = and T = 1, as can be seen in Fig. [H 



If neither T = nor T = 1 provide the optimal recon- 
struction, what would be the right choice? We have to re- 
member that we replaced the probability density function 
at each step of the renormalisation scheme by a Gaussian 
with the correct mean and dispersion. However, the real 
probability is not a Gaussian, and therefore our mean 
field estimator is not optimal. Reconstructions with dif- 
ferent T probe the non-Gaussian probability structure 
with a differently wide Gaussian kernel in phase space, 
and therefore result in a slightly different signal means 
due to the anharmonic nature of our Hamiltonian. 



G. Uncertainty structure 

The remaining uncertainties at the end of the renor- 
malization flow can mainly be read of the renormalized 
propagator D, which we display in top part of Fig. [3] 
in comparison to the original, un-renormalized one Dq. 
The renormalised propagator is a much better approxi- 
mation to the uncertainty-dispersion of the signal poste- 
rior distribution around the mean map than the original 
one. One can clearly see that the data imprinted a highly 
non-uniform structure into the uncertainty pattern visi- 
ble in the renormalized propagator with small uncertain- 
ties where there were many galaxy counts. Also the den- 
sity estimator in Eq. 11071 benefits from the knowledge of 
the uncertainty structure contained in the renormalised 
propagator, as the lower panel of Fig. |4] shows. 

The propagators also visualize the effect any additional 
data would have at different locations. The height and 
width of the propagator values define respectively the 
strength of the response to, and the distance of informa- 
tion propagation from an information source. 

The structure of Dq is imprinted by the prior and the 
mask. At Dq's widest locations the mask blocks any in- 
formation source and the structure of the signal prior S 
becomes visible. At locations where the mask is transpar- 
ent, the reconstruction response per information source is 
lower, as plenty information can be expected there. Also 
the propagator width is smaller, since the individual in- 
formations do not need to be propagated that far, thanks 
to the richer information source density in such regions. 

The structure of Dm has additionally imprinted the ex- 
pected information source density structure given the re- 
construction TO. The strongly non-linear signal response 
has lead to regions with very high galaxy count rates, 
which have larger information densities, and therefore 
lower and narrower information propagators. This im- 
plies, that any additional galaxy detection in the regions 
with high galaxy counts will have little impact on the 
updated map, whereas any additional detected galaxies 
in low density regions will more strongly change it. How- 
ever, the number of additional galaxies per invested ob- 
serving time will be larger in high density regions, which 
may compensate the lower information-per-galaxy ratio 
there. It is therefore interesting to look at the obser- 
vational information content and how it depends on the 
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FIG. 4: Top: Statistical reconstruction error from 1000 signal and data realizations Curves are, roughly in order from top 
(bad performance) to bottom (good performance): error Sm^^^^^ ~ {{s — Jnnaivc)^)^^'^^) of the signal-covariance-convolved naive 
map mnaivG = Ssnaivc (see Eq. I97p . expected Wiener-uncertainty <5-p; — D}/^ , averaged renormalized uncertainty S-~; = 
(Z)t=i)(^^5), error of the classical map SmT=o ~ ((^ — '^r=o)^)(d^s)j error of the renormalized map SmT=i ~ {(* — "t-t=i)^)(^^5) , 
and error of the intermediate map ^m^^o s ~ ((* ~ ''^T=o.5)^)(^^g-). The lowest curve without label is k.. Bottoms: Error 
variance of estimators for the density, g = e", namely cJ^vcry naivo = {(g — e'"°'''™)^)(d^s)5 5^naivc — {{g — rrip'"™)^) J^^^^and 
5m, = {{g - m,)2)V2^ (see Eqs. [M] and [WD . 



actual data realization. 



H. Information gain 

In case of a free theory, the amount of information de- 
pends on the experimental setup and on the prior, but 
is independent of the data obtained as we have shown in 
Sect. IIVD 21 This changes in case that one wants to har- 
vest information in a situation described by a non-linear 
IFT. There, the amount of information can strongly de- 
pend on the actual data. 

This is well illustrated by our LSS reconstruction prob- 
lem. A perturbative calculation of the non-linear infor- 
mation gain is possible if either the bias-factor or the 
signal amplitude, which both control the strength of the 
non-linear interactions, are small compared to unity. 



The information gain, as given by Eq. I83i expanded 
to the first few orders in b 

A/i = ^Trlog(l-hS'/^&2^ (118) 



clearly depends on the actual realization of the data. The 
different fluctuations in the Wiener map mp — Dq j , with 
-Do = {S^^ + b'^ k)^^ and j = b{d — k), imply positive 
and negative information density fluctuations. 

To conveniently calculate the information gain of the 
observation in case of a large bias factor, we use the Gaus- 
sian approximation of the jointed probability function, 
as provided by the renormalization scheme. Due to the 
Gaussianity of this approximate solution, we can simply 



The signal amplitude can, for example, be made small by defining 
the signal of interest to be the cosmic density field, smoothed on 



a sufficiently large scale (> 10 Mpc) so that {s'^)[s) < 1- 
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FIG. 5: Information gain density (the integrands of Eq. I118l and lll9|) for the two reconstruction examples presented, the only 
weakly nonlinear one (top, and Fig. [l| and the strongly non- linear one (bottom, and Fig. [2}. The renormalization result for 
T = 1 (Eq. I119|l . the zero- and first-order perturbative results (Eq. I118|l are shown. The information gain depends on the 
observational sensitivity as well as the actual data. The latter influence is stronger in the non-linear regime, and disappears in 
linear inference problems. 



use the formula for the information gain of a free theory, 
as given by Eq. (SSI This yields 



(119) 



with respect to k{x). Using Eqs. 11171 and[88lwe find 



Aid = -Tt {log {I + Sb^Kr^ 



with 7] 
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being proportional to the ex- 



pected number density of glaxies in this region (see Eq. 
I107|) . It is also here obvious that the information gain 
depends on the data. In regions with higher observed 
galaxy numbers rj is larger, and more information is ex- 
pected to be harvested by further observations. This is 
illustrated in Fig. \E\ where the information gain density, 
the individual contributions to the trace in Eq. 11191 as 
well as the first and and all terms of Eq. 11181 are shown 
for the cases displayed in Figs. [T] and [2l The approxi- 
mate Eq. 11181 seems to be adequate for 6 -C 1, but not 
for our cases of 6 = 0.5 and 2.5. 

The expected benefit of additional observations at lo- 
cation X can also be calculated by differentiating Eq. 11191 



\^^/(ncwdata|(i) ^2 ' 



D. 

(120) 

The expected information gain is especially large for 
observations at locations where the uncertainty D is 
large, where a large number density of galaxies (oc 77) 
can be expected, and where strong non-linearities are 
present (cx 6^). The inverse term caps the maximally 
available information gain at some level. For the two 
reconstruction examples given in Figs. [1] and [2] we dis- 
play the expected information gain as a function of the 
observing postion in Fig. [51 

It is apparent from the top panel, showing the case of 
uniform observation coverage, that additional observa- 
tions are more advantageous at locations where already 
an increased matter density is identified. The bottom 
panel, showing the case of an very inhomogeneous ob- 
servation of strongly nonlinear data, demonstrates that 
filling observational gaps should have the highest priority. 
But there again, regions where the extrapolated galaxy 
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FIG. 6: Differential information gain density for the two reconstruction examples presented, the only weakly nonlinear one 
(top, and Fig. [TJ and the strongly non- linear one (bottom, and Fig. [2|. 



density seems to be larger should be preferred, as can be 
seen from the asymmetric shape of the expected infor- 
mation gain for observations in the gap around x = 0.2. 
In this example, the information-harvest of high galaxy 
density regions can be so large, that further observations 
of the already well observed regions at the boundary of 
the domain seems to be more advantageous than improv- 
ing the poorly observed regions around x = 0.4, where a 
low galaxy density is already aparent from the existing 
data. 

Of course, in order to plan observations in a real case, 
the dependence of observational costs as a function of lo- 
cation X and already achieved zero-response there, k{x), 
have to be folded into the considerations. 



VI. NON-GAUSSIAN CMB FLUCTUATIONS 
VIA /ni-THEORY 

A. Data model 

As an IFT example on the sphere — S^, involving 
two interacting uncertainty fields, we investigate the so 
called /ni-theory of local non-Gaussianities in the CMB 



temperature fluctuations. This problem has currently a 
high scientific relevance due to the strongly increasing 
availability of high fidelity CMB measurements, which 
permit to constrain the physical conditions at very early 
epochs of the Universe. The relevant references for this 
topic were provided in Sect. II C 51 

On top of the very uniform CMB sky with a mean 
temperature Tcmb, small temperature fluctuations on 
the level of (5r^^bf '^V^cmb ~ lO-t^^^.n are observed 
or expected in total Intensity (Stokes I) and in polariza- 
tion E- and B-modes, respectively. The weak B-modes 
are mainly due to lensing of E-modes and some un- 
known level of gravity waves. We will disregard them 
in the following. These CMB temperature fluctuations 
are believed and observed to follow mostly a Gaussian 
distribution. However, inflation predicts some level of 
non-Gaussianity. Some of the secondary anisotropics 
imprinted by the LSS of the Universe via CMB lens- 
ing, the Integrated Sachs- Wolfe and the Rees-Sciama ef- 
fects shou ld also have imprinted non-Gaussian signatures 
[lilii^. The primordial, as well as some of the sec- 
ondary CMB temperature fluctuations are a response to 
the gravitational potential initially seeded during infla- 
tion. Since we are interested in primordial fluctuations. 
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we write 



(121) 

where (p is the 3-dimensional, primordial gravitational 
potential, and R is the response on it of a CMB- 
instrument, observing the induced CMB temperature 
fluctuations in intensity and E-mode polarization. These 
are imprinted by a number of effects, like gravitational 
redshifting, the Doppler effect, and anisotropic Thom- 
son scattering. In case that the data of the instrument 
are foreground-cleaned and deconvolved all-sky maps (as- 
suming the data processing to be part of the instrument) 
the response, which translates the 3-d gravitational field 
into temperature maps, is well known from CMB-theory 
and can be calculated with publicly available codes like 
cmbfast, csunb, and cmbeasy (see Sect. II C 51) . The 
precise form of the response does not matter for a devel- 
opment of the basic concept, and can be inserted later. 

Finally, the noise n subsumes all deviation of the mea- 
surement from the signal response due to instrumental 
and physical effects, which are not linearly correlated 
with the primordial gravitational potential, such are de- 
tector noise, remnants of foreground signals, but also 
primordial gravitational wave contributions to the CMB 
fluctuations. 

The small level of non-Gaussianity expected in the 
CMB temperature fluctuations is a consequence of some 
non-Gaussianity in the primordial gravitational poten- 
tial. Despite the lack of a generic non-Gaussian probabil- 
ity function, many of the inflationary non-Gaussianitics 
seem to be well described by a local process, which taints 
an initially Gaussian random field, (j> P{(t>) = Q{4>i ^) 
(with the (^-covariance $ = {(/)^t)(0)), with some level of 
non-Gaussianity. A well controllable realization of such a 
tarnishing operation is provided by a slightly non-linear 
transformation of cj) into the primordial gravitational po- 
tential if via 

if{x) = c^{x) + /„i {cj)\x) ~ (<^'(a;)}(^)) (122) 

for any x. The parameter /ni controls the level and na- 
ture of non-Gaussianity via its absolute value and sign, 
respectively. This means that our data model reads 



d = i?((/. + /(02-$))+n, 



(123) 



where we dropped the subscript of /„!. In the following 
we assume the noise n to be Gaussian with covariance 
N = {nn))i^n) and define as usual M = R^N^^R for 
notational convenience.^^ 



Non-Gaussian noise components arc in fact expected, and would 
need to be included into the construction of an optimal f^i- 
reconstruction. However, currently we aim only at outlining the 
principles and we are furthermore not aware of an traditional /„!- 
estimator constructed while taking such noise into account. And 
finally, we show at the end how to identify some of such non- 
Gaussian noise sources by producing /,ji-maps on the sphere, 
which can morphologically be compared to known foreground 
structures, like our Galaxy. 



B. Spectrum, bispectrum, and trispectrum 

The nonlinearity of the relation between the hidden 
Gaussian random field cj) and the observable gravitational 
potential ip (Eq. I122|) imprints non-Gaussianity into the 
latter. In order to be able to extract the value of the 
non-Gaussianity parameter / from any data containing 
information on if, we need to know its statistic at least 
up to the four-point function, the trispectrum, which we 
briefly derive with IFT methods. 

To that end, it is convenient to define a (^-moment 
generating function Z[J] and its logarithm 



logZ[J] = log Jv^Pi^)^ 



(124) 



J^($-i -2/J)-V- (/ J)t$ 



■Tr 



log(l-2$/j) 



This permits to calculate via J-derivatives (see Eqs. 
[35t the mean 



0, 



(125) 



<y5 = {V>){4>) 
the spectrum (or covariance) 

= $,, + 2/,<i>y„ (126) 
the bispectrum^^ 

= 2 [^xyfy^yz + ^yzfz^ZX + ^Zxfx^Xy] 

+ S^xyfy%zfz^zxfx (127) 

and the trispectrum 

T'iyiu = {{y^ - 'f)xi^ - ^)y{V - ^)zi^ - ^)u){4') (128) 
= ^xy^zu + ^xz^yu + ^xu^yz + {fx fy fz fu)\^) 

-(h (h +2<I> f<I> f <i> 

xy^ zu ^ ^ ^ xy J y ^ y z J z ^ zu 
'^xyfy^yzfz^zufu^uxfx- + 23 perm. 



Since the bispectrum contains most of the non-Gaussianity sig- 
nature, we also provide its Four ier-space version, which is well- 



known for the /ni-model [e.g. 121711 . The bispectrum for / = const, 
expressed in terms of the i/j-covariance reads 

^xyz — ^ J \y^xy ^yz ~r ^xz ^ zy ~r ^yx ^xz J ^ )■ 

Fourier transforming this yields 

<L.3=2/(2^)'^(fci+fc2 + fc3) 

X [P(fcl)P(fc2) + P(k2)P{kz) + P{k3)P{ki)] + 0{f), 

where P{k) is the power spectrum of ip, which is identical to that 
of up to 0{f^). 
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of the gravitational potential. Since we will investigate 
the possibility of a spatially varying non-Gaussianity pa- 
rameter at the end of this section, we keep track of 
the spatial coordinate of /, but for the time being read 

fx=f- 

The spectrum, bispectrum and trispectrum of our 
CMB-measurement can easily be calculated from the 
gravitational spectrum and bispectrum, respectively: 



"in2n3 
nih2n3n4 



R- R- R- B^"^' 

^*'nix ^^n^y ^*^n3Z ^xyz^ 
Rfiix Rn2V Rnsz Rn^u T^^],^ 



(129) 



RC^'p^R'' + In 



+ 23 permutations- 



where fi denotes the unit vector on the sphere, and we 
have made use of the assumption of the noise being Gaus- 
sian and independent of the signal. In case the noise itself 
has a bi- or trispectrum, or there is a signal dependent 
noise, e.g. due to an incorrect instrument calibration, 
then more terms have to be ad ded to the express ions. 
The usually quoted formulae [e.g.|20l[2l3,[2il[2ll| can 
be obtained from Eq. I129l bv applying spherical harmonic 
transformations. 



C. CMB-Hamiltonian 

Although we are not interested in the auxiliary field 
(j), it is nevertheless very useful for its marginalization to 
define its Hamiltonian, which is 

Hf[d,^] - ~log(g(0,<i>)^(d-i?(0 + /(02_$)),Ar)) 

1 * 1 

= -4>^D-^4> + Ho- + V ^ A(") 



^, . . . , Y^J , 



n=0 



with 

D-^ = <i>-^ + R''N-^R = + M, 
j = R^N-^d, 

A(") = jt(/ $)tM(/$), (130) 

A(i) = -(/ $)tAf and / = J - A(l)^ 
A(2) = -2fj', 

^''xyz = {Mxy fy Syz + 5 pcrmutations) , 
^i%u = ^ {fx Sxy My;, S^u fu + 23 permutations), 

and Hq collects all terms independent of and /. The 
last two tensors should be read without the Einstein sum- 
convention, but with all possible index-permutations. 
Note, that this is a non-local theory for (f) in case that 



either the noise covariance or the response matrix is non- 
diagonal, yielding a non-local M and therefore non-local 
interactions A^'^) and A^"*) . 

We should note, that Babich '220*1 derived the now tra- 
ditional /ni-estimator from a very similar starting point, 
the log-probability for ip. The difference of the resulting 
estimators is not due to the slightly different approaches 
{Hf[d, (fi] versus Hf[d, (/>]), but because of the frequentist 
and Bayes statistics he and we use, respectively. 

In case that the noise as well as the response is di- 
agonal in position space, as it is often assumed for the 
instrument response of properly cleaned CMB maps, 
and is also approximately valid on large angular scales, 
where the Sachs- Wolfe e ffect dominates, we have Nxy — 
CT,^(x) S{x — y), R — —3 |215j | for the total intensity fluc- 
tuations, and thus Mxy = 9a~'^{x) 6{x — y), if we restrict 
the signal space to the last-scattering surface, which we 
identify with S^. This permits to simplify the Hamilto- 



nian to 



Hf[d,, 



1 ^ 1 

-^^D~'^ + Ho~ + E - with 



9fTr7', / = j-Ai = 3(3$/-d)/a: 



Ao = 3($/a2)t(-/2$_/d), A2 = -2//, 
A3 = 5Af/al andA4 = 108/>2. 



(131) 



The numerical coefficients of the last two terms may look 
large, however, these coefficients stand in front of terms 
of typically c/)^ - 10~l^ and - 10~20, which ensures 
their well-bchavedness in any diagrammatic expansion 
series. 

For later usage, we define the Wiener-filter reconstruc- 
tion of the gravitational potential as toq = D j. 



D. /ni-evidence and map making 

Since we are momentarily not interested in reconstruct- 
ing the primordial fluctuations, but to extract knowledge 
on /ni, we marginalize the former by calculating the log- 
evidence logP(d|/) up to quadratic order in /: 



log Zf[d] = log J Vc^P{d,c^\f) 
= log / Vctye-^i^'^^'*''' 



-Hn - An 



oif). 



(132) 



We have made use of the fact that the logarithm of the 
partition sum is provided by all connected diagrams, and 
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that / contains a term of the order A^^^ and A^^) 

contain terms of the order 0{f^), and A'^''^ one of the 
order 0{p), so that they can appear an unrestricted 
number of times, twice and once in diagrams of order 
up to 0{p), respectively. Since only 4**^ order interac- 
tions are involved, an implementation in spherical har- 
monics space may be feasible using the only 4"^ order 
C-coefficients (Eq. IB3[) . which can be calculated com- 
puter algebraically. Finally, we recall 



o 



1 



log|27ri:>| = -Tr(log(27ri:>)). 



(133) 



Although / is not known, the expressions in Eq, 
proportional to / and can be calculated separately, 
permitting to write down the Hamiltonian of / if a suit- 
able prior P{f) is chosen, 

Hd[f] = -\og{P{d\f)P{f)) 

= Ho + ^pD-'f+j^f + 0{f), (134) 

where we collected the linear and quadratic coefficients 
into j and D~^. It is obvious that the optimal /- 
estimator to lowest order is therefore 



mf = {f)(sj\d) = Dj, 
and its uncertainty variance is just 

((/ - mf) if - = D. 



(135) 



(136) 



So far, we have assumed / to have a single universal 
value. However, we can also permit / to to vary spatially, 
or on the sphere of the sky. In the latter case one would 
expand / as 



(137) 



1=0 m=-l 



up to some finite ^max- Then one would recalculate the 
partition sum, now separately for terms proportional to 
fim and fim fi'm', which are then sorted into the vector 
and matrix coefficients of j and , respectively and 
according to 



J{lm) 



dfi 



Irn 



and 



(138) 



D 



(Im) [I'm') 



dfl m df i 



I'm' 



/-map making can then proceed as described above 
in spherical harmonics space. Comparing the resulting 
map in angular space to known foreground sources, as 
our Galaxy, the level of non-Gaussian contamination due 
to their imperfect removal from the data may be assessed. 



E. Comparison to traditional estimator 

We conclude this chapter with a short comparison to 
traditional /ni-estimators. To our knowledge, the most 
developed estimator in the literature is based on the 
CMB-bispectrum, which is the third order correlation 
functions of the data [e.g. l220l l22ll . and references in 
Sect. II C 5| . The IFT based filter presented here con- 
tains terms which are up to fourth order in the data, and 
therefore can be expected to be of higher accuracy since 
both methods are supposed to be optimal. Kogo and 
Komatsu [219j note that the CMB trispectrum should 
contain significant information on /^j, and may be su- 
perior to non-Gaussianity detection compared to the bi- 
spectrum on small angular scales. However, since the 
trispectrum is insensitive to the sign of /ni, its actual 
usage as a proxy is a it more subtle. In the IFT esti- 
mator, any term proportional to /^j enters the inverse of 
the propagator D, and therefore the trispectrum seems 
to unfold its /ni-estimation power mostly in combination 
with the bispectrum, which drives j. 

Under which conditions does the traditional estimator 
emerge from the IFT one? There are three conceptual 
differences between the estimators, in that the IFT filter 
can handle inhomogeneous non-Gaussianity, correct for 
CMB sky and exposure chance coupling, and is unbiased 
with respect to the posterior. 

The traditional estimator is usually written as 



If 1 

J dxA{x)B^{x) ^j^ml<S> 



(139) 



where B — D j — ttiq is the Wiener-filter reconstruction 
of the gravitational potential, A = ^~^B is the same, 
just additionally filtered by the invers e pow er spectrum, 
and TV is a normalization constant [e.g. l202l |. This is fixed 
by the condition that the estimator should be unbiased 
with respect to all signal and noise realizations, 

^f = (mS$"^mo)(d,s|/=i) 

= 4^ll/=i [{MDU<i>Z^{DM),y{DM),,] 

= 2 [^xy%z + ^yz'^zx + ^zx^xy] 

X [{MD)„,<^;^l{DM),y{DM),,] (140) 



The first difference between the estimators is obvious, 
in that the IFT estimator can handle a spatially vary- 
ing f{x). Therefore, we will only regard spatially con- 
stant non-linearity parameters in the following. Since 
no CMB experiment is able to measure the monopole 
temperature fluctuation, the response to any spatially 
homogeneous signal is zero. This means, in Fourier ba- 
sis, that Rn,k=o = and therefore jk=o = Af/c=o,/c' = 0. 
Thus, we find for a Universe with homogeneous statis- 
tics (ik^o = 0) that A^o) ^ A(i) = 0, f = j, and 
A*^^-' — — 2/j, which reduces the number of diagrams 
we have to calculate. 

The IFT estimator is driven by the /-information 
source j, which is given by all diagrams which contain 
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terms linear in /. There are four of them, yielding 



J = 



1 
7 



<^>-^D -2MD 



(141) 



where we used M = D^^ — in order to combine the 
two tree and the two loop diagrams into the first and 
second term, respectively. The term resulting from the 
tree diagrams is actually identically to the unnormalised 
traditional estimator e (Eq. I139p . 

The terms resulting from the loop diagrams vanish for 
an homogeneous M, which a CMB experiment with uni- 
form exposure and constant noise could produce. In case 
of an inhomogeneous M, which is the more realistic case, 
the loop term does not vanish and corrects for chance cor- 
relations between the CMB-realization (as seen through 
j) and the noise and response structure of the experi- 
ment (as encoded in M and D). Creminelli et al. 222j | 
already pointed out that such a linear correction term is 
necessary in case of an inhomogeneous sky coverage. 

Anyhow, the second difference between the estimators 
is that the IFT based one applies a correction for chance 
correlations of CMB sky and sky exposure and the tradi- 
tional one does not. This term is absent in the traditional 
estimator since the latter was constructed as the optimal 
estimator which is third order in the data. This excluded 
the loop term, which is linear in the data. 

An inclusion of this term into the traditional estimator 
is straightforward and actually done by the more recent 
/ni measurements [e.g. i223] . The normalization constant 
TV is unaffected by this, since the expectation value of the 
loop term averaged over all possible signal realization is 
zero. 

This brings us to the third difference between the esti- 
mators, the different normalization. The traditional esti- 
mator is normalized by a data independent constant J\f, 
where the IFT estimator is normalized by a data depen- 
dent term 



D- 




where only the first three diagrams are data independent 
and (7/ is the variance of the prior, which we assume to 
be P{f) = G{f, CTj). The detailed expressions for the dif- 
ferent diagrams can be found in Appendix [C] For both 
estimators, the traditional and the IFT one, the normal- 
ization is supposed to guarantee unbiasedness, however, 
with respect to different probability distributions. 



The traditional estimator is unbiased in the frequentist 
sense, for an average over all signal / and data realiza- 
tions. However, the IFT estimator is unbiased in the 
Bayesian sense, with respect to the posterior, the proba- 
bility distribution of all signals given the data. Since the 
data are given, and not assumed to vary any more after 
the observation is performed, it can and should affect the 
normalization constant, which encodes the sensitivity of 
our non-Gaussianity estimation. 

The reason for the IFT normalization constant (or /- 
propagator) to be data dependent can be understood 
as follows. There are data realizations which are bet- 
ter suited to reveal the presence of a non-Gaussianities 
than others, even if they have identical j. Such a de- 
pendence of the detectability of a effect on the concrete 
data realization is common in non-linear Baysian infer- 
ence, and was even more prominent in the example of the 
reconstruction of a log- normal density field in Sect. |Vl 



VII. SUMMARY AND OUTLOOK 

Starting with fundamental information theoretical con- 
siderations about the nature of measurements, signals, 
noise and their relation to a physical reality given a model 
of the Universe or the system under consideration, we 
reformulated the inference problem in the language of 
information field theory (IFT) . IFT is actually a statisti- 
cal field theory. The information field is identified with a 
spatially distributed signal, which can freely be chosen by 
the scientist according to needs and technical constraints. 
The mathematical apparatus of field theory permits to 
deal with the ensemble of all possible field configurations 
given the data and prior information in a consistent way. 

With this conceptual framework, we derived the 
Hamiltonian of the theory, showed that the free theory 
reproduces the well known results of Wiener-filter theory, 
and presented the Feynman-rules for non-linear, interact- 
ing Hamiltonians in general, and in particular cases. The 
latter are information fields over Fourier- and spherical 
harmonics-spaces for inference problems in i?" and S'^, 
respectively. Our "philosophical" considerations permit- 
ted to argue why the resulting IFTs are usually well nor- 
malized, but often non-local. Since the propagator of the 
theory is closely related to the Wiener-filter, for which 
nowadays efficient numerical algorithms exist as image 
reconstruction and map-making codes, and the informa- 
tion source term is usually a noise weighted version of the 
data, the necessary computational tools are at hand to 
convert the diagrammatic expressions into well perform- 
ing algorithms. 

Furthermore, we provided the Boltzmann-Shannon in- 
formation measure of IFT based on the Hclmholtz free 
energy, thereby highlighting the embedding of IFT in the 
framework of statistical mechanics. 

As examples of the IFT recipe, two concrete IFT prob- 
lems with cosmological motivation were discussed, which 
are also thought as blueprints for other inference prob- 
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lems. The first was targeting at tire problem of recon- 
structing the spatially continuous cosmic LSS matter 
distribution from discrete galaxy counts in incomplete 
galaxy surveys. The resulting algorithm can also be used 
for image reconstruction with low-number photon statis- 
tics, e.g in low-dose X-ray imaging. 

The second example was the design of an optimal 
method to measure or constrain any possible local non- 
linearities in the CMB temperature fluctuations. This 
may serve as a blueprint for statistical monitoring of the 
linearity of a signal amplifier. 

Wc conclude here with a short outlook on some prob- 
lems that are accessible to the presented theory. 

Many signal inference problems involve the reconstruc- 
tion of fields without precisely known statistics. Some 
coefficients in the IFT-Hamiltonians may only be phe- 
nomenological in nature, and therefore have to be de- 
rived from the same data used for the reconstruction 
itself. This more intricate interplay of parameter and 
information field can also be incorporated into the IFT 
framework, as we will show with a subsequent work. 

For cosmological applications, along the lines started 
in this work, clearly more realistic data models need to be 
investigated. For example, to understand the response in 
galaxy formation to the underlying dark matter distribu- 
tion in terms of a realistic, statistical model, to be used 
in constructing the corresponding IFT Hamiltonian for a 
dark-matter information field, detailed higher-order cor- 
relation coefficients have to be distilled from numerical 
simulations or semi-analytic descriptions. Also the CMB 
Hamiltonian may benefit from the inclusion of remnants 
from the CMB foreground subtraction process, permit- 
ting to gather more solid evidence on fundamental pa- 
rameters which are hidden in the CMB fluctuations, like 
the amplitude of non-Gaussianities. 

Furthermore, there exist a number of more or less 
heuristic algorithms for inverse problems, which have 
proven to serve well under certain circumstances. Re- 
verse engineering of their implicitly assumed priors and 
data models may permit to understand better for which 
conditions they are best suited, as well how to improve 
them in case these conditions are not exactly met. 

Finally, we are very curious to see whether and how 
the presented framework may be suitable to inference 
problems in other scientific fields. 
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APPENDIX A: NOTATION 

We briefly summarize our notation of functions in po- 
sition and Fourier space. 

A here usually real, but in principle also complex func- 
tion f{x) over the n-dimensional space is regarded as a 

vector / in a discrete and finite-dimensional, or contin- 
uous and infinite-dimensional Hilbert space. / will de- 
note this vector, independently of the momentarily cho- 
sen function basis, be it the real space /(x) = or 
the Fourier basis 

f{k) = {k\f)= Jdxf{x)e"'-\ (Al) 

Here, the volume integration usually is performed only 

over an finite domain with volume V . Tliis leads to the 
convention for the origin of the delta function in A;-space, 

m = j^, (A2) 

and also to a Fourier transformation operator F = \k){x\, 
with = e^^^ , and its inverse F"^ = |a;)(A;|, with fJj. = 
g-jfea: rjijjg dagger is used to denote transposed and 
complex conjugated objects. We have {F^F)xy = Ixy as 
well as {F F^)kki = l^k' for the following definition of 
the scalar product of two functions / and g in real and 
Fourier space: 

rg = {f\9)- jdxr{x)g{x) = j^r{k)g{k), 

(A3) 

where the asterix denotes complex conjugation. The 
statistical power-spectrum of / is denoted by P/(fc) = 

{\m?)u)iv- 

We also introduce for convenience the position-space 
component-wise product of two functions 

{fg){x) = f{x)g{x), (A4) 

which also permits compact notations like 

{\ogf){x) = log(/(.T)), {f/g){x) = f{x)/g{x), (A5) 

and alike. The component- wise product should not 
be confused with the tensor product of two vectors 
{fg^){x,y) = .f{x)g*{y). 

The diagonal components of a matrix M in position- 
space representation form a vector which we denote by 

M = diag^M, with = M^^. (A6) 
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Similarly, a diagonal matrix in position-space represen- 
tation, whose diagonal components are given by a vector 
/, will be denoted by 

/ = diag^/ with f^y = fx Ixy (A7) 

Thus, M = M if and only if M diagonal, and f = f 
always. 

In our notation a multivariate Gaussian reads: 

g(s,S)^-^e^p(-ls^S-'s) (A8) 

Here, S = (ss^)(s) denotes the covariance tensor of the 
Gaussian field s, which is drawn from P(s) = Q{s, S). If 
s is statistically homogeneous, S is fully described by the 
power-spectrum Ps{k): 

S^l, ^ (2 7r)"<5(fc-fc') {Ps{k))-\ (A9) 

The Fourier representation of the trace of a Fourier- 
diagonal operator, 

Tr(A)= jdxAxx^V J-^P^ik), (AlO) 

is very useful in combination with the following expres- 
sion for the determinant of a Hcrmitian matrix, 

log|yl| = Tr(logA). (All) 

Furthermore, we usually suppress the dependency of 
probabilities on the underlying model / and its param- 
eters 6 in our notation. I.e. instead of P{s\9,I) we 
just write P{s) or P{s\9) depending on our focus. Here 
6 — {S, N, R, ...) contains all the parameters of the model, 
which are assumed to be known within this work. 



APPENDIX B: FEYNMAN RULES ON THE 
SPHERE 

Here, we provide the Feynman rules on the sphere. 
The real-space rules are identical to those of flat spaces, 
with just the scalar product replaced by the integral over 
the sphere, etc. In case the problem at hand has an 
isotropic propagator, which only depends on the distance 
of two points on the sphere, but not on their location 
or orientation, the propagator is diagonal if expressed in 
spherical harmonics Yim (x) . Thanks to the orthogonality 
relation of spherical harmonics, we have for x,y ^ S'^ 

{YY^U ^Y.^im{x)YC„M - - y) = (1),, (Bl) 
and 

= Sw 6mm' — (l)(;,m)(;',m')- (B2) 



Therefore, we can just insert real-space identity matrices 
1 — Y Y'^ in between any expression in real-space dia- 
grammatic expression and assign Y'^ to the right, and Y 
to the left term of it. This way we find the spherical- 
harmonics Feynman rules, which are very similar to the 
Fourier-space ones, in that they also require directed 
propagators-lines for proper angular-momentum conser- 
vation. For a theory with only local interactions, these 
read: 

1. An open end of a line has external (not summed) 
angular- momentum quantum numbers (/,m). 

2. A line connecting momentum {l,m) with momen- 
tum {l',m') corresponds to a propagator between 
these momenta: = Cd{1) Sw Smm' , 
where Cd{1) is the angular power spectrum of the 
propagator. 

3. A data source vertex is {j + J — Xi){l,m), where 
(/, m) is the angular momentum at the data-end of 
the line. 

4. A vertex with quantum number (Jo, mo) with 
incoming and riout outgoing lines (njn + riout > 1) 
with momentum labels (Zi, mi) . . . (Z„i„, run^^) and 
{l'i,m[) . . . {ln^^,,m'^^^J, respectively, is given by 

-A™(Zo,TOo) '^lllZl)Z{ll°"ZlT^ ' ^^^"^^ ^ ^ill 
defined in Eq. Ell 

5. An internal vertex has internal (summed) angular- 
momentum quantum numbers (/',m'). Summation 

means a term X^z^o 'Y^m=-i' i'^ front of the expres- 
sion. 

6. The expression gets divided by the symmetry factor 
of its diagram. 

The interaction structure in spherical harmonics-space is 
complicated due to the non-orthogonality of powers and 
products of the spherical harmonic functions, compared 
to the Fourier-space case, where any power or product 
of Fourier-basis functions is again a single Fourier-basis 
function. 

The spherical structure is encapsulated in the coeffi- 
cients 

<:::;:;;:::i:ij'^-y^- (i[T^,™,(^)) (n ^k(-))' 

(B3) 

which can be expressed in terms of sums and products 
of Wigner coefficients, thanks to the relations i^*„(a;) — 

Yi^^rnix), 

Im 

'<(„«i,i)«".w('o oo)' 
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and the orthogonality relation in Eq. IB21 to be applied 
successively in this order. Due to this complication, it is 
probably most efficient to calculate propagation in spher- 
ical harmonics space, but to change back to real space for 
any interaction vertex of high order. 



APPENDIX C: /ni-PROPAGATOR 

We provide in the following the individual terms of the 
/ni-Propagator in Eq. 11421 The individual diagrams are 
all 0{f^) and are given here for the case / = 1: 



-Tr [D^M] - ^D^MD 



2DM + DM 



D 



2MD + MD 



O 



Tr [D^ MDM] 



-2m^MD^j - 4Tr 



mDjD M 



(CI) 
(C2) 

(C3) 
(C4) 
(C5) 



m^M D'^ Mm + 4 Tr mDMmDM 



+2 Tr [fh D {fh M + M ffi) D M] (C6) 



= -2 



2DM + DM 



D j m 



X3 



m^^M D -2TT[mM mD] 



(C7) 
(C8) 



2DM + DM D [2mMm + Mm^] (C9) 

- [2mMm + Mm'^] ^D [2mMm + Mm^] 

(CIO) 

-2{mi)'^D{Mm^ + 2mMm) (Cll) 



ni^^ M rr? 

2 

2 (jn])^ D (j m) 



(C12) 
(C13) 



We used here the conventions ni = D j and {D'^)xy — 
{D^yY and remind that A(°) = A^^) = 0, / = j. 



A(2) 



2fi, K^xyz = [MxySyz + 5 perm.], h^xyzu 



(4) 



\ [SxyMy^Szu + 23 perm.]. 
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